```
$ cabal install unagi-bloomfilter
```

The library uses the *bloom-1* variant from “Fast Bloom Filters and Their
Generalization” by Yan Qiao, et al. I’ll try to write more about it when I have
the time. Also I just gave a talk on things I learned working on the project
last night at the New York Haskell User Group:

```
http://www.meetup.com/NY-Haskell/events/233372271/
```

It was quite rough, but I was happy to hear from folks that found some interesting things to take away from it.

Thanks to Gershom for inviting me to speak, for my company Signal Vine for sponsoring my trip out, and to Yan Qiao for generously answering my silly questions and helping me understand the paper.

Signal Vine is an awesome group of people, with interesting technology and problems to solve, and we’re looking to grow the small development team. If you have some experience with haskell (you don’t have to be a guru) and are interested, please reach out to Jason or me at:

```
brandon@signalvine.com
jason@signalvine.com
```

]]>```
cabal install hashabler
```

(see my initial announcement post which has some motivation and pretty pictures)

You can see the CHANGELOG but the main change is an implementation of SipHash. It’s about as fast as our implementation of FNV-1a for bytestrings of length fifty and slightly faster when you get to length 1000 or so, so you should use it unless you’re wanting a hash with a simple implementation.

If you’re implementing a new hashing algorithm or hash-based data structure, please consider using hashabler instead of hashable.

]]>`hashabler`

and wanted to share
a nice way I found to translate stateful bit-twiddling code in C (which makes
heavy use of bitwise assignment operators) to haskell.
I was working from the
reference implementation.
As you can see statefulness and mutability are an implicit part of how the
algorithm is defined, as it modifies the states of the `v`

variables.

```
#define SIPROUND \
do { \
v0 += v1; v1=ROTL(v1,13); v1 ^= v0; v0=ROTL(v0,32); \
v2 += v3; v3=ROTL(v3,16); v3 ^= v2; \
v0 += v3; v3=ROTL(v3,21); v3 ^= v0; \
v2 += v1; v1=ROTL(v1,17); v1 ^= v2; v2=ROTL(v2,32); \
} while(0)
int siphash( uint8_t *out, const uint8_t *in, uint64_t inlen, const uint8_t *k )
{
/* ... */
for ( ; in != end; in += 8 )
{
m = U8TO64_LE( in );
v3 ^= m;
TRACE;
for( i=0; i<cROUNDS; ++i ) SIPROUND;
v0 ^= m;
}
```

I wanted to translate this sort of code as directly as possible (I’d already decided if it didn’t work on the first try I would burn my laptop and live in the woods, rather than debug this crap).

First we’ll use name shadowing to “fake” our mutable variables, making it easy to ensure we’re always dealing with the freshest values.

```
{-# OPTIONS_GHC -fno-warn-name-shadowing #-}
```

We’ll also use `RecordWildCards`

to make it easy to capture the “current state”
of these values, through folds and helper functions.

```
{-# LANGUAGE RecordWildCards #-}
```

And finally we use the trivial `Identity`

monad
(this trick I learned from Oleg)
which gets us the proper scoping we want for our `v`

values:

```
import Data.Functor.Identity
```

Here’s a bit of the haskell:

```
siphash :: Hashable a => SipKey -> a -> Word64
siphash (k0,k1) = \a-> runIdentity $ do
let v0 = 0x736f6d6570736575
v1 = 0x646f72616e646f6d
v2 = 0x6c7967656e657261
v3 = 0x7465646279746573
...
v3 <- return $ v3 `xor` k1;
v2 <- return $ v2 `xor` k0;
v1 <- return $ v1 `xor` k1;
v0 <- return $ v0 `xor` k0;
...
-- Initialize rest of SipState:
let mPart = 0
bytesRemaining = 8
inlen = 0
SipState{ .. } <- return $ hash (SipState { .. }) a
let !b = inlen `unsafeShiftL` 56
v3 <- return $ v3 `xor` b
-- for( i=0; i<cROUNDS; ++i ) SIPROUND;
(v0,v1,v2,v3) <- return $ sipRound v0 v1 v2 v3
(v0,v1,v2,v3) <- return $ sipRound v0 v1 v2 v3
v0 <- return $ v0 `xor` b
...
(v0,v1,v2,v3) <- return $ sipRound v0 v1 v2 v3
return $! v0 `xor` v1 `xor` v2 `xor` v3
```

If you were really doing a lot of this sort of thing, you could even make a simple quasiquoter that could translate bitwise assignment into code like the above.

]]>```
cabal install hashabler
```

`hashabler`

is a rewrite of the hashable
library by Milan Straka and Johan Tibell, having the following goals:

Extensibility; it should be easy to implement a new hashing algorithm on any Hashable type, for instance if one needed more hash bits

Honest hashing of values, and principled hashing of algebraic data types (see e.g. #30)

Cross-platform consistent hash values, with a versioning guarantee. Where possible we ensure morally identical data hashes to indentical values regardless of processor word size and endianness.

Make implementing identical hash routines in other languages as painless as possible. We provide an implementation of a simple hashing algorithm (FNV-1a) and make an effort define Hashable instances in a way that is well-documented and sensible, so that e.g. one can (hopefully) easily implement string hashing routine in JavaScript that will match the way we hash strings here.

I started writing a fast concurrent bloom filter variant, but found none of the
existing libraries fit my needs. In particular `hashable`

was deficient in a
number of ways:

The number of hash bits my data structure requires can vary based on user parameters, and possibly be more than the 64-bits supported by hashable

Users might like to serialize their bloomfilter and store it, pass it to other machines, or work with it in a different language, so we need

- hash values that are consistent across platforms
- some guarantee of consistency across library versions

I was also very concerned about the general approach taken for algebraic types,
which results in collision, the use of “hashing” numeric values to themselves,
dubious combining functions, etc. It wasn’t at all clear to me how to ensure my
data structure wouldn’t be broken if I used `hashable`

. See below for a very
brief investigation into hash goodness of the two libraries.

There isn’t interest in supporting my use case or addressing these issues in
`hashable`

(see e.g. #73, #30, and #74)
and apparently hashable is working in practice for people, but maybe this new
package will be useful for some other folks.

Hashing-based data structures assume some “goodness” of the underlying hash function, and may depend on the goodness of the hash function in ways that aren’t always clear or well-understood. “Goodness” also seems to be somewhat subjective, but can be expressed statistically in terms of bit-independence tests, and avalanche properties, etc.; various things that e.g. smhasher looks at.

I thought for fun I’d visualize some distributions, as that’s easier for my puny brain to understand than statistics. We visualize 32-bit hashes by quantizing by 64x64 and mapping that to a pixel following a hilbert curve to maintain locality of hash values. Then when multiple hash values fall within the same 64x64 pixel, we darken the pixel, and finally mark it red if we can’t go any further to indicate clipping.

It’s easy to cherry-pick inputs that will result in some bad behavior by
hashable, but below I’ve tried to show some fairly realistic examples of
strange or less-good distributions in `hashable`

. I haven’t analysed these
at all. Images are cropped ¼ size, but are representative of the whole 32-bit
range.

First, here’s a hash of all `[Ordering]`

of size 10 (~59K distinct values):

Hashabler:

Hashable:

Next here’s the hash of one million `(Word8,Word8,Word8)`

(having a domain ~ 16 mil):

Hashabler:

Hashable:

I saw no difference when hashing english words, which is good news as that’s probably a very common use-case.

If you could test the library on a big endian machine and let me know how it goes, that would be great. See here.

You can also check out the **TODO**s scattered throughout the code and send
pull requests. I mayb not be able to get to them until June, but will be very
grateful!

I’m always open to interesting work or just hearing about how companies are using haskell. Feel free to send me an email at brandon.m.simmons@gmail.com

]]>At a high level `criterion`

makes your benchmark the inner loop of a function,
and runs that loop a bunch of times, measures the result, and then divides by
the number of iterations it performed. The approach is both useful for comparing
alternative implementations, and probably the only meaningful way of answering
“how long does this code take to run”, short of looking at the assembly and
counting the instructions and consulting your processor’s manual.

If you’re skeptical, here’s a benchmark we’d expect to be very fast:

```
import Criterion.Main
main :: IO ()
main = do
defaultMain [
bench "sum2" $ nf sum [1::Int,2]
, bench "sum4" $ nf sum [1::Int,2,3,4]
, bench "sum5" $ nf sum [1::Int,2,3,4,5]
]
```

And indeed it’s on the order of nanoseconds:

```
benchmarking sum2
time 27.20 ns (27.10 ns .. 27.35 ns)
0.994 R² (0.984 R² .. 1.000 R²)
mean 28.72 ns (27.29 ns .. 32.44 ns)
std dev 6.730 ns (853.1 ps .. 11.71 ns)
variance introduced by outliers: 98% (severely inflated)
benchmarking sum4
time 58.45 ns (58.31 ns .. 58.59 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 58.47 ns (58.26 ns .. 58.66 ns)
std dev 654.6 ps (547.1 ps .. 787.8 ps)
variance introduced by outliers: 11% (moderately inflated)
benchmarking sum5
time 67.08 ns (66.84 ns .. 67.33 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 67.04 ns (66.85 ns .. 67.26 ns)
std dev 705.5 ps (596.3 ps .. 903.5 ps)
```

The results are consistent with each other; `sum`

seems to be linear, taking
13-14ns per list element, across our different input sizes.

This is what I was doing today which motivated this post. I was experimenting with measuring the inner loop of a hash function:

```
fnvInnerLoopTest :: Word8 -> Word32
{-# INLINE fnvInnerLoopTest #-}
fnvInnerLoopTest b = (2166136261 `xor` fromIntegral b) * 16777619
```

These were the results criterion gave me:

```
benchmarking test
time 9.791 ns (9.754 ns .. 9.827 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 9.798 ns (9.759 ns .. 9.862 ns)
std dev 167.3 ps (117.0 ps .. 275.3 ps)
variance introduced by outliers: 24% (moderately inflated)
```

These are the sorts of timescales that get into possibly measuring overhead of function calls, boxing/unboxing, etc. and should make you skeptical of criterion’s result. So I unrolled 4 and 8 iteration versions of these and measured the results:

```
main :: IO ()
main = do
defaultMain [
bench "test" $ nf fnvInnerLoopTest 7
, bench "test4" $ nf fnvInnerLoopTest4 (7,8,9,10)
, bench "test8" $ nf fnvInnerLoopTest8 (7,8,9,10,11,12,13,14)
]
benchmarking test
time 9.380 ns (9.346 ns .. 9.418 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 9.448 ns (9.399 ns .. 9.567 ns)
std dev 240.4 ps (137.9 ps .. 418.6 ps)
variance introduced by outliers: 42% (moderately inflated)
benchmarking test4
time 12.66 ns (12.62 ns .. 12.72 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 12.68 ns (12.64 ns .. 12.73 ns)
std dev 158.8 ps (126.9 ps .. 215.7 ps)
variance introduced by outliers: 15% (moderately inflated)
benchmarking test8
time 17.88 ns (17.82 ns .. 17.94 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 17.89 ns (17.81 ns .. 17.97 ns)
std dev 262.7 ps (210.3 ps .. 349.7 ps)
variance introduced by outliers: 19% (moderately inflated)
```

So this seems to give a more clear picture of how good our bit twiddling is in that inner loop. I was curious if I could measure the overhead directly in criterion though. Somewhat surprisingly to me, it seems I could!

I added the following benchmark to my list:

```
, bench "baseline32" $ nf (\x-> x) (777::Word32)
```

The idea being to isolate the overhead of applying the most trivial function
and calling `nf`

on an example value of our output type (`Word32`

in this
case).

```
benchmarking baseline32
time 9.485 ns (9.434 ns .. 9.543 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 9.509 ns (9.469 ns .. 9.559 ns)
std dev 155.8 ps (122.6 ps .. 227.8 ps)
variance introduced by outliers: 23% (moderately inflated)
```

If we consider this value the baseline for the measurements initially reported,
the new results are both linear-ish, as we would expect, and also the resulting
absolute measurements fall about where we’d expect from the assembly we’d hope
for (I still need to verify that this is *actually* the case), e.g. our intial
`test`

is in the ~1ns range, about what we’d expect from an inner loop with a
couple instructions.

I thought this was compelling enough to open an issue to see whether this technique might be incorporated into criterion directly. It’s at least a useful technique that I’ll keep playing with.

Anyway, benchmark your code.

]]>