Linking Smaller Haskell Binaries
Haskell binaries can get quite large (think ~100MB), especially for projects with many transitive dependencies. Here are two strategies that can help at link time, the latter being more experimental.
I used the test-pandoc
binary from pandoc on
GHC 9.2.5 below. This was nice because obviously it was easy to test if linking
broke anything (just run the tests).
-split-sections
and --gc-sections
You can instruct ghc to emit code in individual minimal sections, allowing the
linker to easily find and remove dead code. This looks like, in your
cabal.project
:
package *
ghc-options: -split-sections
-- For C code; This likely won't have much effect, but worth including:
gcc-options: -fdata-sections -ffunction-sections
package pandoc
-- All of the major thinkers support gc-sections, but we'll use lld since
-- it supports the next experiment we want to do (and it's fast).
-- NOTE: assumes gcc >= 10, which knows how to invoke lld
ld-options: -fuse-ld=lld -Wl,--gc-sections,--build-id
These options take the size of the stripped binary down from 113M to 83M (-27%).
See Vidar’s blog for further exploration of this technique, in the context of shellcheck.
Identical code folding
Both gold and lld are able to do a limited form of icf, identifying functionally-equivalent sections at link time, and combining them, fixing up any inbound references. lld seems to have a more effective implementation.
You can try it by adding the following to the settings above:
package pandoc
...
-- --print-icf-sections is just for debugging
ld-options: -Wl,--icf=all,--ignore-data-address-equality,--ignore-function-address-equality,--print-icf-sections
This gets the binary down to 64M for me (a further -23%).
Looking closer at ICF
There are lots of reasons we can imagine a Haskell binary having a lot of potential code folding candidates; for instance the same function being inlined and maybe specialized the same way in several modules. I wanted to poke at the binary briefly and see if I could learn anything.
Taking a look at the logs from lld and picking a section arbitrarily:
selected section /tmp/pandoc/dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a(HTML.o):(.text..Ls10LWe_info)
removing identical section /tmp/pandoc/dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a(HTML.o):(.text..Ls10LWD_info)
… (snip)
removing identical section /tmp/pandoc/dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a(LaTeX.o):(.text..LsWGUO_info)
… (snip)
Here’s a section that’s repeated across half a dozen modules within pandoc. Compiling with debug symbols, and inspecting with:
$ objdump -g --dwarf=decodedline -S -d -Mintel dist-newstyle/build/x86_64-linux/ghc-9.2.5/pandoc-3.0/build/libHSpandoc-3.0-inplace.a
…we can find two of the sections mentioned above, and indeed they look the same:
I wasn’t really able to understand how these two map back on to the source code. The dwarf debug symbols associated come from:
pTagContents :: PandocMonad m => InlinesParser m Inlines
pTagContents =
B.displayMath <$> mathDisplay
<|> B.math <$> mathInline
-- XXX FOLDED (orig):
<|> pStr
<|> pSpace
<|> smartPunctuation pTagContents
<|> pRawTeX
<|> pSymbol
<|> pBad
…and…
-- XXX FOLDED
-- FYI: type LP m = ParsecT TokStream LaTeXState m
doubleQuote :: PandocMonad m => LP m Inlines
doubleQuote =
quoted' doubleQuoted (try $ count 2 $ symbol '`')
(void $ try $ count 2 $ symbol '\'')
<|> quoted' doubleQuoted ((:[]) <$> symbol '“') (void $ symbol '”')
-- the following is used by babel for localized quotes:
<|> quoted' doubleQuoted (try $ sequence [symbol '"', symbol '`'])
(void $ try $ sequence [symbol '"', symbol '\''])
Both polymorphic parser code, but beyond that it’s difficult to see what’s going on.
Misc
Interactions with -fdistinct-constructor-tables
I’ve come to rely on -fdistinct-constructor-tables -finfo-table-map
for
profiling and debugging, and I was curious how it interacted with icf.
I created a little test project with:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Main where
main :: IO ()
main = do
let xs1 = func1 1000000
print $ sum xs1
print xs1
let xs2 = func2 1000000
print $ sum xs2
print xs2
newtype X = X Int deriving (Num,Show)
func1 :: Int -> [X]
func1 n = map X [1..n]
func2 :: Int -> [X]
func2 n = map X [1,3..(n*4)]
… Compiled with all the options above, and ran with +RTS -l-agu -hi -RTS
. I
noticed that unsurprisingly the duplicate info tables are folded away, which
isn’t what we want for debugging:
You’d need to instruct the linker that these separate infotables should be treated as GC roots, and not removed.
An opportunity to speed up compilation?
All these duplicate sections represent not only wasted space in the binary, but also wasted time during compilation: code generation, but also probably time in the simplifier, etc etc. Could GHC be smarter about e.g. caching compilation units that eventually are emitted as sections early in compilation?
Approximately half of the 120K folded sections come from pandoc itself, that is are from just a single invocation of ghc, so the idea doesn’t seem outlandish.
Tools that didn’t work
I tried to play with bloaty to investigate the composition of the binary, but it chokes on Haskell code.
I thought it might be neat to try to use kcov to investigate how much of the resulting binary was dead code, before and after different linker options, but it chokes on (or with) probably the same bug.