Linking Smaller Haskell Binaries

Publish date: Jan 7, 2023
Last updated: Jan 8, 2023

Tags:

Haskell binaries can get quite large (think ~100MB), especially for projects with many transitive dependencies. Here are two strategies that can help at link time, the latter being more experimental.

I used the test-pandoc binary from pandoc on GHC 9.2.5 below. This was nice because obviously it was easy to test if linking broke anything (just run the tests).

`-split-sections` and `--gc-sections`

You can instruct ghc to emit code in individual minimal sections, allowing the linker to easily find and remove dead code. This looks like, in your cabal.project:

package *
  ghc-options: -split-sections
  -- For C code; This likely won't have much effect,  but worth including:
  gcc-options: -fdata-sections -ffunction-sections  

package pandoc
  -- All of the major thinkers support gc-sections,  but we'll use lld since 
  -- it supports the next experiment we want to do (and it's fast).
  -- NOTE: assumes gcc >= 10, which knows how to invoke lld
  ld-options: -fuse-ld=lld -Wl,--gc-sections,--build-id

These options take the size of the stripped binary down from 113M to 83M (-27%).

See Vidar’s blog for further exploration of this technique, in the context of shellcheck.

Identical code folding

Both gold and lld are able to do a limited form of icf, identifying functionally-equivalent sections at link time, and combining them, fixing up any inbound references. lld seems to have a more effective implementation.

You can try it by adding the following to the settings above:

package pandoc
  ...

  -- --print-icf-sections is just for debugging
  ld-options: -Wl,--icf=all,--ignore-data-address-equality,--ignore-function-address-equality,--print-icf-sections

This gets the binary down to 64M for me (a further -23%).

Looking closer at ICF

There are lots of reasons we can imagine a Haskell binary having a lot of potential code folding candidates; for instance the same function being inlined and maybe specialized the same way in several modules. I wanted to poke at the binary briefly and see if I could learn anything.

Taking a look at the logs from lld and picking a section arbitrarily:

selected section /tmp/pandoc/dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a(HTML.o):(.text..Ls10LWe_info)
  removing identical section /tmp/pandoc/dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a(HTML.o):(.text..Ls10LWD_info)
   … (snip)
  removing identical section /tmp/pandoc/dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a(LaTeX.o):(.text..LsWGUO_info)
   … (snip)

Here’s a section that’s repeated across half a dozen modules within pandoc. Compiling with debug symbols, and inspecting with:

$ objdump -g --dwarf=decodedline -S -d -Mintel dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a

…we can find two of the sections mentioned above, and indeed they look the same:

identical folded sections

I wasn’t really able to understand how these two map back on to the source code. The dwarf debug symbols associated come from:

pTagContents :: PandocMonad m => InlinesParser m Inlines
pTagContents =
      B.displayMath <$> mathDisplay
  <|> B.math        <$> mathInline
-- XXX FOLDED (orig):
  <|> pStr
  <|> pSpace
  <|> smartPunctuation pTagContents
  <|> pRawTeX
  <|> pSymbol
  <|> pBad

…and…

-- XXX FOLDED
-- FYI: type LP m = ParsecT TokStream LaTeXState m
doubleQuote :: PandocMonad m => LP m Inlines
doubleQuote =
       quoted' doubleQuoted (try $ count 2 $ symbol '`')
                     (void $ try $ count 2 $ symbol '\'')
   <|> quoted' doubleQuoted ((:[]) <$> symbol '“') (void $ symbol '”')
   -- the following is used by babel for localized quotes:
   <|> quoted' doubleQuoted (try $ sequence [symbol '"', symbol '`'])
                            (void $ try $ sequence [symbol '"', symbol '\''])

Both polymorphic parser code, but beyond that it’s difficult to see what’s going on.

Misc

Interactions with `-fdistinct-constructor-tables`

I’ve come to rely on -fdistinct-constructor-tables -finfo-table-map for profiling and debugging, and I was curious how it interacted with icf.

I created a little test project with:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Main where

main :: IO ()
main = do
    let xs1 = func1 1000000 
    print $ sum xs1
    print xs1
    let xs2 = func2 1000000 
    print $ sum xs2
    print xs2

newtype X = X Int deriving (Num,Show)

func1 :: Int -> [X]
func1 n = map X [1..n]

func2 :: Int -> [X]
func2 n = map X [1,3..(n*4)]

… Compiled with all the options above, and ran with +RTS -l-agu -hi -RTS. I noticed that unsurprisingly the duplicate info tables are folded away, which isn’t what we want for debugging:

distinct info tables get folded

You’d need to instruct the linker that these separate infotables should be treated as GC roots, and not removed.

An opportunity to speed up compilation?

All these duplicate sections represent not only wasted space in the binary, but also wasted time during compilation: code generation, but also probably time in the simplifier, etc etc. Could GHC be smarter about e.g. caching compilation units that eventually are emitted as sections early in compilation?

Approximately half of the 120K folded sections come from pandoc itself, that is are from just a single invocation of ghc, so the idea doesn’t seem outlandish.

Tools that didn’t work

I tried to play with bloaty to investigate the composition of the binary, but it chokes on Haskell code.

I thought it might be neat to try to use kcov to investigate how much of the resulting binary was dead code, before and after different linker options, but it chokes on (or with) probably the same bug.