Thoughts on FIFO-ness
Last updated: Dec 11, 2022
I’ve been doing a lot of experimenting with concurrent operations in haskell and in particular playing with and thinking about the design of concurrent FIFO queues. These structures are difficult to make both efficient and correct, due to the effects of contention on the parts of the structure tasked with coordinating reads and writes from multiple threads.
These are my thoughts so far on FIFO semantics.
FIFO? And how!
In the interesting paper
“How FIFO is your concurrent FIFO queue?”(PDF).
A Haas, et al. propose that an ideal FIFO queue has operations that are
instantaneous (think of each write
having an infinitely accurate timestamp,
and each read
taking the corresponding element in timestamp order). They then
measure the degree to which real queues of various designs deviate from this
platonic FIFO semantics in their message ordering, using a metric they call
“element-fairness”. They experimentally measure element-fairness of both
so-called “strict FIFO” as well as “relaxed FIFO” designs, in which elements
are read in more or less the order they were written (some providing guarantees
of degree of re-ordering, others not).
The first interesting observation they make is that no queue actually exhibits FIFO semantics by their metric; this is because of the realities of the way atomic memory operations like CAS may arbitrarily reorder a set of contentious writes.
The second interesting result is that the efficient-but-relaxed-FIFO queues which avoid contention by making fewer guarantees about message ordering often perform closer to ideal FIFO semantics (by their metric) than the “strict” but slower queues!
Observable FIFO Semantics
As an outsider, reading papers on FIFO queue designs I get the impression that what authors mean by “the usual FIFO semantics” is often ill-defined. Clearly they don’t mean the platonic zero-time semantics of the “How FIFO… " paper, since they can’t be called FIFO by that measure.
I suspect what makes a queue “strict FIFO” (by the paper’s categorization) might simply be
If
write x
returns at timeT
, thenx
will be read before the elements of anywrite
s that have not yet started by timeT
.
The idea is difficult to express, but is essentially that FIFO semantics is
only observable by way of actions taken by a thread after returning from a
write
(think: thread A
writes x
, then tells B
which writes y
, where
our program’s correctness depends on the queue returning y
after x
). Note
that since a queue starts empty this is also sufficient to ensure writes don’t
“jump ahead” of writes already in the queue.
Imagine an absurd queue whose write
never returns; there’s very little one
can say for certain about the “correct” FIFO ordering of writes in that case,
especially when designing a program with a preempting scheduler that’s meant to
be portable. Indeed the correctness criterion above is actually probably a lot
stricter than many programs require; e.g. when there is no coordination between
writers, an observably-FIFO queue need only ensure that no reader thread sees
two messages from the same writer thread out of order (I think).
The platonic zero-time FIFO ordering criterion used in the paper is quite different from this observable, correctness-preserving FIFO criterion; I can imagine it being useful for people designing “realtime” software.
Update 04/15/2014:
What I’m trying to describe here is called linearizability, and is indeed a well-understood and common way of thinking about the semantics of concurrent data structures; somehow I missed or misunderstood the concept!
Conclusion
At a certain level of abstraction, correct observable FIFO semantics shouldn’t be hard to make efficient; after all, the moments during which we have contention (and horrible performance) are also the moments during which we don’t care about (or have no way of observing) correct ordering. In other words (although we have to be careful of the details) a thread-coordination scheme that breaks down (w/r/t element-fairness) under contention isn’t necessarily a problem. Compare-and-swap does just that, unfortunately it breaks down in a way that is slower rather than faster.