Thoughts on FIFO-ness

I’ve been doing a lot of experimenting with concurrent operations in haskell and in particular playing with and thinking about the design of concurrent FIFO queues. These structures are difficult to make both efficient and correct, due to the effects of contention on the parts of the structure tasked with coordinating reads and writes from multiple threads.

These are my thoughts so far on FIFO semantics.

FIFO? And how!

In the interesting paper “How FIFO is your concurrent FIFO queue?”(PDF). A Haas, et al. propose that an ideal FIFO queue has operations that are instantaneous (think of each write having an infinitely accurate timestamp, and each read taking the corresponding element in timestamp order). They then measure the degree to which real queues of various designs deviate from this platonic FIFO semantics in their message ordering, using a metric they call “element-fairness”. They experimentally measure element-fairness of both so-called “strict FIFO” as well as “relaxed FIFO” designs, in which elements are read in more or less the order they were written (some providing guarantees of degree of re-ordering, others not).

The first interesting observation they make is that no queue actually exhibits FIFO semantics by their metric; this is because of the realities of the way atomic memory operations like CAS may arbitrarily reorder a set of contentious writes.

The second interesting result is that the efficient-but-relaxed-FIFO queues which avoid contention by making fewer guarantees about message ordering often perform closer to ideal FIFO semantics (by their metric) than the “strict” but slower queues!

Observable FIFO Semantics

As an outsider, reading papers on FIFO queue designs I get the impression that what authors mean by “the usual FIFO semantics” is often ill-defined. Clearly they don’t mean the platonic zero-time semantics of the “How FIFO… " paper, since they can’t be called FIFO by that measure.

I suspect what makes a queue “strict FIFO” (by the paper’s categorization) might simply be

If write x returns at time T, then x will be read before the elements of any writes that have not yet started by time T.

The idea is difficult to express, but is essentially that FIFO semantics is only observable by way of actions taken by a thread after returning from a write (think: thread A writes x, then tells B which writes y, where our program’s correctness depends on the queue returning y after x). Note that since a queue starts empty this is also sufficient to ensure writes don’t “jump ahead” of writes already in the queue.

Imagine an absurd queue whose write never returns; there’s very little one can say for certain about the “correct” FIFO ordering of writes in that case, especially when designing a program with a preempting scheduler that’s meant to be portable. Indeed the correctness criterion above is actually probably a lot stricter than many programs require; e.g. when there is no coordination between writers, an observably-FIFO queue need only ensure that no reader thread sees two messages from the same writer thread out of order (I think).

The platonic zero-time FIFO ordering criterion used in the paper is quite different from this observable, correctness-preserving FIFO criterion; I can imagine it being useful for people designing “realtime” software.

Update 04/15/2014:

What I’m trying to describe here is called linearizability, and is indeed a well-understood and common way of thinking about the semantics of concurrent data structures; somehow I missed or misunderstood the concept!

Conclusion

At a certain level of abstraction, correct observable FIFO semantics shouldn’t be hard to make efficient; after all, the moments during which we have contention (and horrible performance) are also the moments during which we don’t care about (or have no way of observing) correct ordering. In other words (although we have to be careful of the details) a thread-coordination scheme that breaks down (w/r/t element-fairness) under contention isn’t necessarily a problem. Compare-and-swap does just that, unfortunately it breaks down in a way that is slower rather than faster.