For CRRVA. You may need to close the stupid sidebar ↗ to see them side-by-side.

For CRRVA. You may need to close the stupid sidebar ↗ to see them side-by-side.

I’ve been doing a lot of experimenting with concurrent operations in haskell and in particular playing with and thinking about the design of concurrent FIFO queues. These structures are difficult to make both efficient and correct, due to the effects of contention on the parts of the structure tasked with coordinating reads and writes from multiple threads.

These are my thoughts so far on FIFO semantics.

In the interesting paper
“How FIFO is your concurrent FIFO queue?”(PDF).
A Haas, et al. propose that an ideal FIFO queue has operations that are
instantaneous (think of each `write`

having an infinitely accurate timestamp,
and each `read`

taking the corresponding element in timestamp order). They then
measure the degree to which *real* queues of various designs deviate from this
platonic FIFO semantics in their message ordering, using a metric they call
“element-fairness”. They experimentally measure element-fairness of both
so-called “strict FIFO” as well as “relaxed FIFO” designs, in which elements
are read in more or less the order they were written (some providing guarantees
of degree of re-ordering, others not).

The first interesting observation they make is that no queue actually exhibits FIFO semantics by their metric; this is because of the realities of the way atomic memory operations like CAS may arbitrarily reorder a set of contentious writes.

The second interesting result is that the efficient-but-relaxed-FIFO queues which avoid contention by making fewer guarantees about message ordering often perform closer to ideal FIFO semantics (by their metric) than the “strict” but slower queues!

As an outsider, reading papers on FIFO queue designs I get the impression that what authors mean by “the usual FIFO semantics” is often ill-defined. Clearly they don’t mean the platonic zero-time semantics of the “How FIFO… ” paper, since they can’t be called FIFO by that measure.

I suspect what makes a queue “strict FIFO” (by the paper’s categorization) might simply be

If

`write x`

returnsat time`T`

, then`x`

will be read before the elements of any`write`

s that havenot yet startedby time`T`

.

The idea is difficult to express, but is essentially that FIFO semantics is
only observable by way of actions taken by a thread *after* returning from a
`write`

(think: thread `A`

writes `x`

, then tells `B`

which writes `y`

, where
our program’s correctness depends on the queue returning `y`

after `x`

). Note
that since a queue starts empty this is also sufficient to ensure writes don’t
“jump ahead” of writes already in the queue.

Imagine an absurd queue whose `write`

never returns; there’s very little one
can say for certain about the “correct” FIFO ordering of writes in that case,
especially when designing a program with a preempting scheduler that’s meant to
be portable. Indeed the correctness criterion above is actually probably a lot
stricter than many programs require; e.g. when there is no coordination between
writers, an observably-FIFO queue need only ensure that no reader thread sees
two messages from the same writer thread out of order (I think).

The platonic zero-time FIFO ordering criterion used in the paper is quite different from this observable, correctness-preserving FIFO criterion; I can imagine it being useful for people designing “realtime” software.

**Update 04/15/2014**:

What I’m trying to describe here is called
*linearizability*,
and is indeed a well-understood and common way of thinking about the semantics
of concurrent data structures; somehow I missed or misunderstood the concept!

At a certain level of abstraction, correct observable FIFO semantics shouldn’t be hard to make efficient; after all, the moments during which we have contention (and horrible performance) are also the moments during which we don’t care about (or have no way of observing) correct ordering. In other words (although we have to be careful of the details) a thread-coordination scheme that breaks down (w/r/t element-fairness) under contention isn’t necessarily a problem. Compare-and-swap does just that, unfortunately it breaks down in a way that is slower rather than faster.

This is the first *real* release `shapely-data`

, a haskell library
up here on hackage
for working with algebraic datatypes in a simple generic form made up of
haskell’s primitive product, sum and unit types: `(,)`

, `Either`

, and `()`

.

You can install it with

```
cabal install shapely-data
```

In order from most to least important to me, here are the concerns that motivated the library:

Provide a good story for

`(,)`

/`Either`

as a*lingua franca*generic representation that other library writers can use without dependencies, encouraging abstractions in terms of products and sums (motivated specifically by my work on`simple-actors`

.Support algebraic operations on ADTs, making types composable

`-- multiplication: let a = (X,(X,(X,()))) b = Left (Y,(Y,())) :: Either (Y,(Y,())) (Z,()) ab = a >*< b in ab == ( Left (X,(X,(X,(Y,(Y,()))))) :: Either (X,(X,(X,(Y,(Y,()))))) (X,(X,(X,(Z,())))) ) -- exponents, etc: fanout (head,(tail,(Prelude.length,()))) [1..3] == (1,([2,3],(3,()))) (unfanin (_4 `ary` (shiftl . Sh.reverse)) 1 2 3 4) == (3,(2,(1,(4,()))))`

Support powerful, typed conversions between

`Shapely`

types`data F1 = F1 (Maybe F1) (Maybe [Int]) deriving Eq data F2 = F2 (Maybe F2) (Maybe [Int]) deriving Eq f2 :: F2 f2 = coerce (F1 Nothing $ Just [1..3]) data Tsil a = Snoc (Tsil a) a | Lin deriving Eq truth = massage "123" == Snoc (Snoc (Snoc Lin '3') '2') '1'`

Lowest on the list is supporting abstracting over different recursion schemes or supporting generic traversals and folds, though some basic support is planned.

Finally, in at least some cases this can completely replace `GHC.Generics`

and
may be a bit simpler. See `examples/Generics.hs`

for an example of the
`GHC.Generics`

wiki example
ported to `shapely-data`

. And for a nice view on the changes that were
required, do:

```
git show 3a65e95 | perl /usr/share/doc/git/contrib/diff-highlight/diff-highlight
```

The `GHC.Generics`

representation has a lot of metadata and a complex
structure that can be useful in deriving default instances; more important to
us is to have a simple, canonical representation such that two types that
differ only in constructor names can be expected to have identical generic
representations.

This supports APIs that are type-agnostic (e.g. a database library that returns
a generic `Product`

, convertible later with `to`

), and allows us to define
algebraic operations and composition & conversion functions.

In this post Philip Nilsson describes an inspiring, principled approach to solving a toy problem posed in a programming interview. I wanted to implement a solution to a variant of the problem where we’d like to process a stream. It was pretty easy to sketch a solution out on paper but Philip’s solution was invaluable in testing and debugging my implementation. (See also Chris Done’s mind-melting loeb approach)

My goal was to have a function:

```
waterStream :: [Int] -> [Int]
```

that would take a possibly-infinite list of columns and return a stream of
*known water quantities*, where volumes of water were output as soon as
possible. We can get a solution to the original problem, then, with

```
ourWaterFlow = sum . waterStream
```

Here is the solution I came up with, with inline explanation:

```
{-# LANGUAGE BangPatterns #-}
-- start processing `str` initializing the highest column to the left at 0, and
-- an empty stack.
waterStream :: [Int] -> [Int]
waterStream str = processWithMax 0 str []
processWithMax :: Int -> [Int] -> [(Int,Int)] -> [Int]
processWithMax prevMax = process
where
process [] = const []
-- output the quantity of water we know we can get, given the column at the
-- head of the stream, `y`:
process (y:ys) = eat 1
where
eat !n xxs@((offset,x):xs)
-- done with `y`, push it and its offset onto the stack
| y < x = process ys ((n,y):xxs)
-- at each "rise" we can output some known quantity of water;
-- storing the "offset" as we did above lets us calculate water
-- above a previously filled "valley"
| otherwise = let col = offset*(min y prevMax - x)
cols = eat (n+offset) xs
-- filter out zeros:
in if col == 0 then cols else col : cols
-- if we got to the end of the stack, then `y` is the new highest
-- column we've seen.
eat !n [] = processWithMax y ys [(n,y)]
```

The bit about “offsets” is the tricky part which I don’t know how to explain without a pretty animation.

It took me much longer than I was expecting to code up the solution above that
worked on a few hand-drawn test cases, and at that point I didn’t have high
confidence that the code was correct, so I turned to
quickcheck and
`assert`

.

First I wanted to make sure the invariant that the “column” values in the stack were strictly increasing held:

```
import Control.Exception (assert)
...
--process (y:ys) = eat 1
process (y:ys) stack = assert (stackSane stack) $ eat 1 stack
...
```

Then I used Philip’s solution (which I had confidence in):

```
waterFlow :: [Int] -> Int
waterFlow h = sum $
zipWith (-)
(zipWith min (scanl1 max h) (scanr1 max h))
h
```

to test my implementation:

```
*Waterflow> import Test.QuickCheck
*Waterflow Test.QuickCheck> quickCheck (\l -> waterFlow l == ourWaterFlow l)
*** Failed! Falsifiable (after 21 tests and 28 shrinks):
[1,0,0,0,1]
```

Oops! It turned out I had a bug in this line (fixed above):

```
--old buggy:
--cols = eat (n+1) xs
--new fixed:
cols = eat (n+offset) xs
```

The solution seems to perform pretty well, processing 1,000,000 `Int`

s in 30ms
on my machine:

```
import Criterion.Main
main = do
gen <- create
rs <- replicateM 1000000 $ uniformR (0,100) gen
defaultMain [ bench "ourWaterFlow" $ whnf ourWaterFlow rs
```

I didn’t get a good look at space usage over time, as I was testing with
`mwc-random`

which doesn’t seem to support creating a lazy infinite list of
randoms and didn’t want to hunt down another library. Obviously on a stream
that simply descends forever, our stack of `(Int,Int)`

will grow to infinite
size.

It seems as though there is a decent amount of parallelism that could be exploited in this problem, but I didn’t have any luck on a quick attempt.

Have a parallel solution, or something just faster? Or an implementation that doesn’t need a big stack of previous values?

`TypeFamilies`

is a GHC extension that lets you create formerly-impossible
abstractions in a very straightforward way. It took me several tries before
they clicked for me though, so this is the introduction to `TypeFamilies`

that
I wish I had read first (although I just found Brent Yorgey’s, which would have done the trick).

I’m treating the subject very narrowly for most of this post, and try to round things out a little at the very end.

If this isn’t the first thing you’ve read about `TypeFamilies`

, it might be
helpful to forget a few things. The question “what precisely is a type family?”
isn’t going to be very helpful; in general, the terminology for the
constellation of constructions that `TypeFamilies`

gives you is a huge mess,
with multiple partially-overlapping terms in the wild, none of which are
helpful for developing an intuition about what all of this is about.

I also found various analogies I’d read to be useless, so forget those too.

Consider the familiar type synonym:

```
type PairOf a = (a,a)
```

Normally this is presented as syntactic sugar, with little to do with the type system.

A more interesting way of thinking about `PairOf`

is as a *function* (as
suggested by the `=`

), where evaluation involves substituting occurrences of
the left hand side (LHS) with the right, in the usual way. These functions are
evaluated *in your type signatures* at compile time.

The analogous regular term-level function would of course be:

```
pairOf a = (a,a)
```

Simple enough. Now let’s think about a simple *term-level* function, and see
what an analogous *type-level* type synonym/function might look like:

```
last (a: []) = a
last (a: (b:bs)) = last (b:bs)
```

For our type-level `Last`

we need something like lists at the type-level, so
we’ll use the common nested tuple representation of `(,)`

as cons and `()`

as
the empty list, e.g.:

```
x :: (Int,(Int,(Int,()))) -- like [1,2,3]
```

Hopefully I didn’t just lose you. Remember for now we just care about using
this list-like tuple thing in our *type signatures*.

If you were to charge ahead and try to define `Last`

using type synonyms
treated as full blown functions, you might come up with:

```
-- this isn't okay:
type Last (a, ()) = a
type Last (a, (b,bs)) = Last (b,bs)
```

Unfortunately the compiler will laugh at you. Type synonyms can only have abstract variable arguments on the LHS where above we have tried to deconstruct them using pattern matching, and to define a different RHS for both cases. Further we’ve made the definition recursive. None of that is okay.

In fact the humble type synonym is only a very simple sort of function (a natural transformation or something close) which is very easy to evaluate, but also very limited.

The `TypeFamilies`

extension lets us define `Last`

successfully almost exactly
as we did above.

```
{-# LANGUAGE TypeFamilies #-}
-- we have to "declare" `Last` separately, and the "family"
-- here distinguishes the syntax from a normal type synonym:
type family Last l
-- ...and then can define our "cases":
type instance Last (a,()) = a
type instance Last (a,(b,bs)) = Last (b,bs)
```

At this point when the type-checker sees `Last (a,(b,bs))`

in a type signature
it will replace it with `Last (b,bs)`

, and continue until all of these “type
functions” are evaluated. I may be fudging things a bit but that’s the
general idea.

Since these are a more general sort of type function, they can even be used to replace traditional type synonyms:

```
type family PairOf a
type instance PairOf a = (a,a)
```

It would be neat to be able to work with “lists” that look like e.g.
`(1,('a',("hello",())))`

; they are heterogeneous, operations like `head`

would
be type-safe, etc. So imagine we want to define a `last`

on types of this
list-like tuple sort of data.

What would the type of `last`

look like? We know it has to be polymorphic,
since its arguments might look like `()`

or `(1,(2,()))`

, different types of
course. So we’ll need a type-class (and a couple other standard extensions):

```
{-# LANGUAGE FlexibleInstances, FlexibleContexts #-}
class LastOfTupleList l where
last' :: l -> Last l -- < we use our `Last` type function
```

Our instances are trivial:

```
instance LastOfTupleList (a,()) where
last' (a,()) = a -- < the type-checker can see that, indeed, `a`
-- is equal to `Last (a,())`
instance (LastOfTupleList (b, bs))=> LastOfTupleList (a,(b,bs)) where
last' (a,(b,bs)) = last' (b,bs)
```

Letting us do:

```
>>> last' (1,(2,()))
2
>>> last' (1,('a',("hello",())))
"hello"
```

Notice how our instances of `Last`

and `last`

have almost the same structure;
this is very common.

If you’re a programmer type you were probably irritated by my initial clumsy
definition of `last`

; why not:

```
last (a:[]) = a
last (a:as) = last as -- `as` matches whatever can fall through the pattern
-- above; in this case only non-empty lists
```

Well, I was being sneaky because the type-level analogue isn’t allowed!

```
type instance Last (a,()) = a
type instance Last (a,as) = Last as -- BAD!
```

This is because unlike functions, type families are *open* meaning, like
typeclasses, a new instance can be added at any moment. Therefore there’s
no way to define a “default” instance to use after all other matches fail,
you simply get illegal overlapping synonym instances such as the one above;
the order in which we defined the two doesn’t matter.

For some use cases this is what we need, for others (such as our `last'`

) we’d
really prefer that type synonym families be *closed* so that we can pattern
match in the usual way.
This feature is apparently
coming soon.

That should give you an intuition. At this point you might want to stop and read through the following documentation, or continue reading below before coming back to these links for a more refined understanding and additional details:

Since we so often define define a type class in terms of one or more type families, we’re given a simplified syntax for combining them in one place.

```
class LastOfTupleList l where
type Last l -- an *associated* type family
last' :: l -> Last l
instance LastOfTupleList (a,()) where
type Last (a,()) = a
last' (a,()) = a
```

When people say “associated types” they mean type functions that are associated with a typeclass using the syntax above.

Type synonym family instances are said to be *not injective*, meaning two
different type functions can map to the same type on the RHS, e.g.

```
type instance F Int = Bool
type instance F Char = Bool
```

It’s easy to forget this when building new abstractions, and assume that the
typechecker will infer from the RHS (e.g. `Bool`

above) the argument passed in
to the type function (`Int`

or `Char`

).

I’ve completely focused on the *type synonym* flavor of `TypeFamilies`

above,
but there is also a `data/newtype`

flavor in which, for each instance
definition the RHS is a *brand new* type declaration, rather than mapping to an
existing type

```
-- from http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/type-families.html
data family T a
data instance T Int = T1 Int | T2 Bool -- new constructors T1 and T2 defined here
newtype instance T Char = TC Bool
```

Because each instance maps to a unique type, data families *are* injective
allowing the type checker to infer the LHS of the equation knowing the right.

`TypeFamilies`

provides the syntax`a ~ b`

to indicate type equality constraints; this is especially useful with type synonym functions, but can be useful on its own as well.- kind signatures are required for type functions on types taking arguments, e.g.
`Maybe`

After pouring 1.5 bottles of horrid caustic chemicals down the drain to no avail, I found I was able to clear a tough clogged shower drain with just the empty bottle:

- pour kettle full of boiling water into drain
- stuff overflow drain with plastic wrap (so that pressure is directed toward the clog, not out the top)
- invert empty draino bottle placing mouth around drain, forming a tight seal
- squeeze bottle rapidly and forcefully

Don’t get boiling water in your eyes.

*Disclaimer: un-researched, under-educated, fast and loose hacking follows.*

The other day I was thinking about counting in binary and how bits cascade when incrementing a counter.

```
000
001
010
011
100
```

I wondered: what if each bit flip was very “expensive”, for the hardware say? Are there other methods we could use that woud result in a “cheaper” increment operation, by avoiding those costly carries?

First, here’s a little bit of boring code that we’ll use below to print a list of Ints as padded binary, and some requisite imports:

```
import Data.Bits
import Data.Word
import Numeric
import Text.Printf
import Data.Char
prettyPrintBits :: (Integral i, Show i)=> [i] -> IO ()
prettyPrintBits = mapM_ putStrLn . prettyListBits
prettyListBits :: (Integral i, Show i)=> [i] -> [String]
prettyListBits l = map (printPadded . toBin) l where
toBin i = showIntAtBase 2 intToDigit i ""
printPadded = printf ("%0"++width++"s")
width = show $ ceiling $ logBase 2 $ (+1) $ fromIntegral $ maximum l
```

Anyway I started playing with this by hand, with a 3-bit example, and drawing funny pictures like these:

On the left is a table with the cost, in bit-flips, to increment a binary counter to that number, from the previous.

Then (in the middle of the pic above) I drew an undirected graph with edges connecting the numbers that differed by only a single bit (lots of symmetry there, no?); so what we’d like to do is walk that graph without repeating, to cycle through all the 3-bit combinations as “efficiently” as possible.

After doodling for a second I found a nice symmetrical path through the graph, and wrote the numbers we pass through in sequence (on the right side of the pic above).

Okay, but can we write a program to generate that sequence? And generalize it?

After I marked the bit-flip locations in the “bit-economical” binary sequence, I noticed the pattern from top to bottom is the same as what we’d use to build a binary tree from an ordered list, numbered from the bottom up.

Furthermore glancing back to my original table of “increment costs” I noticed they follow the same pattern; it turns out we can use a few simple bitwise operations on a normal binary count to enumerate all n-bit numbers with optimally-efficient bit flips. The algorithm is:

- start the sequence at 0
- for a count 0,1,2.. do
- find the difference (
`XOR`

) of adjacent numbers - find the number of
`1`

s in the result (`POPCNT`

) - flip the bit at that offset to get to the next number in the sequence

- find the difference (

Here’s an implementation in haskell that represents an infinite (sort of) efficient bit count:

```
lazyBits :: [Word]
lazyBits = scanl complementBit 0 bitsToFlip where
ns = [0..] :: [Word]
bitsToFlip = zipWith (\x-> subtract 1 . popCount . xor x) ns $ tail ns
```

We can play with this & pretty-print the result:

```
*Main Text.Printf> take (2^3) $ lazyBits
[0,1,3,2,6,7,5,4]
*Main Text.Printf> prettyPrintBits $ take (2^3) $ lazyBits
000
001
011
010
110
111
101
100
```

This is similar to de Bruijn sequences and I’m sure this sort of sequence has some interesting applications. For instance just as with de Bruijn sequences I could see this being useful in cracking an imaginary lock consisting of toggling push-button digits.

But most real locks like that have buttons that lock in place and have a single
“reset” button. To model those we could do something similar to what we just
did, only we’d want to study a directed graph where we only ever have edges
resulting from flipping a `0`

to a `1`

, and all numbers have an edge back to
`0`

.

We’ve gotten side-tracked a bit. We started wanting to find a more efficient binary number system for incrementing, but is that what we got?

Well, no. In particular we have no algorithm for incrementing an arbitrary
number in our sequence. For instance, given only the number `010`

we have
no way (that I’ve been able to figure out) of knowing that the left-most bit
should be flipped. In other words we need *some* amount of state to determine
the next bit to be flipped. At least that’s my assumtion.

One thing we *do* have is a method for comparing numbers in the sequence,
something that’s pretty much essential if you want to do anything with a
counter.

Here is some code to do just that, defined here in terms of bit strings (as
output by `prettyListBits`

); I haven’t figure out a clever set of bitwise ops
to do this, and have a somewhat foggy idea of why this works:

```
compareNums :: String -> String -> Ordering
compareNums [] [] = EQ
compareNums ('0':as) ('0':bs) = compareNums as bs
compareNums ('1':as) ('1':bs) = invert $ compareNums as bs
where invert EQ = EQ
invert GT = LT
invert LT = GT
compareNums (a:_) (b:_) = compare a b
compareNums _ _ = error "this assumes 0-padded, aligned numbers"
```

So how much does it really cost *on average* to increment an arbitrary number
in the usual binary number system? We know it’s more than one, but we need a
function from bit length to average cost.

To do that efficiently we’ll once again use the pattern we discovered above when we sketched out a 3-bit example. Looking at the way the different costs break down we have:

```
4 operations at cost 1
2 operations at cost 2
1 operation at cost 3
```

…out of a total of `8 - 1`

increment operations, so the average cost is:

```
(4*1 + 2*2 + 1*3) / (8 - 1) = 1.5714
```

So a general function for `b`

bits would look like:

```
amortizedIncrCost :: Integer -> Double
amortizedIncrCost b = avgd $ sum $ zipWith (*) (iterate (2*) 1) [b,b-1 ..1]
where avgd n = (fromIntegral n) / (2^b - 1)
```

We’d like to know how this behaves as our number of bits `b`

approaches
infinity. We could take the limit of the function, but who knows how to do
that, so let’s just experiment:

```
*Main> amortizedIncrCost 3
1.5714285714285714
*Main> amortizedIncrCost 4
1.7333333333333334
*Main> amortizedIncrCost 8
1.968627450980392
*Main> amortizedIncrCost 16
1.9997558556496529
*Main> amortizedIncrCost 32
1.9999999925494194
*Main> amortizedIncrCost 64
2.0
```

Seems quite clearly to approach 2 as we exceed the accuracy of our `Double`

,
i.e. the average cost to increment an arbitrary-size binary counter is
two bit flips. Interesting!

Remember above how we mentioned our counting system required some additional state to determine the next bit to flip? Well a single bit is the smallest unit of state we know about; therefore even if we could figure out how to determine bit-flips with a single bit of state, or could derive a way of calculating the offset from the number itself, we’d still never be able to do better than two bit-flips per increment.

This is probably all pretty obvious information theory stuff, but was a fun little diversion.