data but there's a separate backup/restore system that dumps ... can often recover data written by old programs . .Well-
Binary serialisation: better, stronger, faster Duncan Coutts March 2014, Fun in the Afternoon 2014
.
Well-Typed .
The Haskell Consultants .
Binary serialisation use cases Two different use case for reading/writing binary data Serialisation of Haskell values
Externally defined formats
I
Don’t really care about the binary format
I
Binary format details are critical
I
Only need to interoperate with same program
I
Need to interoperate with other systems
I
Want to be able to serialise direct to/from internal types
I
Some manual conversion to internal types acceptable
I
e.g. "I just want to save a data file and read it back later"
I
e.g. "I want to read these X509 certificates"
.
Well-Typed .
Binary serialisation use cases Two different use case for reading/writing binary data Serialisation of Haskell values
Externally defined formats
I
Don’t really care about the binary format
I
Binary format details are critical
I
Only need to interoperate with same program
I
Need to interoperate with other systems
I
Want to be able to serialise direct to/from internal types
I
Some manual conversion to internal types acceptable
I
e.g. "I just want to save a data file and read it back later"
I
e.g. "I want to read these X509 certificates"
The binary and cereal packages do both (confusing!) .
Well-Typed .
Binary serialisation use cases Two different use case for reading/writing binary data Serialisation of Haskell values I
Don’t really care about the binary format
I
Only need to interoperate with same program
I
Want to be able to serialise direct to/from internal types
I
e.g. "I just want to save a data file and read it back later" This talk is about the serialisation of Haskell values case .
Well-Typed .
Goals
Goals of this work I
I
replace the serialisation layer in the binary package I get binary into the Haskell Platform much better serialisation format I I I
I
schema-free decoding allow for versioning/migration more compact
significantly faster
.
Well-Typed .
Headline results Much better serialisation format I I
I I
CBOR (like MessagePack but better designed) schema-free decoding (e.g. generic conversion to JSON) allows for versioning/migration 1/2 encoded file size
Dramatically faster I I I
3-5x faster encoding 2-3x faster decoding “real world” benchmarks, not micro-benchmarks
Buy-in from the other binary package hackers .
Well-Typed .
The existing data format
The binary package’s data format I
Very simple and direct
I
Fixed-size “as is” encodings
I
No type information
I
No headers
I
No redundancy
Examples I
Int encoded as 8-bytes big endian
I
List as length prefixed sequence (8-bytes again)
I
Tuples as sequence
I
Sum types use 1-byte constructor tag
.
Well-Typed .
The binary package’s data format
It’s a bad format I
not very compact
I
no redundancy
.
Well-Typed .
The binary package’s data format
It’s a bad format I
not very compact
I
no redundancy
The lack of redundancy is the greater problem
.
Well-Typed .
The binary package’s data format The current format makes developers nervous. Lose the types that generated the data and you’re in trouble: I I I
can’t tell what anything is can’t tell where one bit of data ends and the next begins practically impossible to recover
No redundancy means no chance to detect or localise errors. Debugging incorrect Binary instances is hard. Error information if things go wrong is nigh on useless *** Exception: Data.Binary.Get.runGet at position 5844572: demandInput: not enough bytes .
Well-Typed .
The binary package’s data format Annoying or unsuitable in some use cases
Example Haddock cannot read .haddock files created by older versions Thus .haddock files not shareable via hackage while hoogle text DB files are shareable
.
Well-Typed .
The binary package’s data format Annoying or unsuitable in some use cases
Example Haddock cannot read .haddock files created by older versions Thus .haddock files not shareable via hackage while hoogle text DB files are shareable I do not trust it for medium/long term data storage
Example The hackage server uses cereal (via acid-state) for most data but there’s a separate backup/restore system that dumps all data in text or other standard formats. .
Well-Typed .
A new data format
What do we want in a data format?
Programmers like text formats I
easy to ‘eyeball’ the data
I
can usually tell where data begins and ends
I
can usually guess what data is
I
easier to manage ad-hoc upgrades and migrations
I
can often recover data written by old programs
.
Well-Typed .
What do we want in a binary data format?
Properties we want in binary data format I
compact encoded size
I
fast encoding and decoding decode without a schema (ie the program types)
I
I I
‘eyeball’ the data testing, debugging and recovery
I
redundancy to help error checking
I
option to change schema while still reading old data files
.
Well-Typed .
Existing formats we could steal Various off-the-shelf binary formats I
MessagePack
I
Protocol buffers
I
BSON
I
Thrift
I
BERT
I
CBOR
Many of these provide schema-less decoding. Several provide equivalent of JSON “universal” data model. Just steal one: re-use an existing specification and tools.
.
Well-Typed .
CBOR
We are picking the CBOR format I
“Concise Binary Object Representation”
I
Specified in RFC 7049 (proposed standard)
I
JSON-like data model
I
Self-describing low level structure
I
Much like MessagePack but better designed
I
Reasonably flexible and extensible
.
Well-Typed .
CBOR CBOR data items have a 1-byte header I I I I
specifies type (integers, bytes, string, lists etc) specifies format variable length encodings small values directly in header
Examples Value 10
Binary (hex) 1A
Value [1, 2, 3]
Binary (hex) 83 01 02 03
100
18 64
[1, [2, 3]]
82 01 82 02 03
"" "a"
60
[ 1, 2, 3]
9F 01 02 03 FF
61 61
( "a", "b")
7F 61 61 61 62 FF
[]
80
100000.0
FA 47 C3 50 00 .
Well-Typed .
Encoding constructors data T = C1 Int Int | C2 String | C3 We will encode constructors as CBOR arrays, with a constructor tag: Haskell value C1 3 4
CBOR notation [0, 3, 4]
C2 "hi"
[1, "hi" ]
C3
[2]
Even single-constructor types will use a tag.
.
Well-Typed .
A mechanism for versioning / migration data T = C1 Int Int | C2 String [Int] | C3
-- previously just: C2 String
When we add or change a constructor we can use a new tag. The decoder may recognise the old tag and decode and covert. Haskell value old C2 "hi"
CBOR notation [1, "hi" ]
new C2 "hi" [ ]
[3, "hi", [ ]]
.
Well-Typed .
A mechanism for versioning / migration data T = C1 Int Int | C2 String [Int] | C3
-- previously just: C2 String
When we add or change a constructor we can use a new tag. The decoder may recognise the old tag and decode and covert. Haskell value old C2 "hi"
CBOR notation [1, "hi" ]
new C2 "hi" [ ]
[3, "hi", [ ]]
This is not automatic, but it is possible. Further library work is needed (like safecopy). .
Well-Typed .
Benchmarks
Benchmark setup
“real world” benchmark rather than micro-benchmark I
Serialising .cabal file data structures I I
I I
I
100 for ‘small’ criterion benchmarks 10k for ‘big’ GC/alloc benchmarks
Mostly lists and deeply nested trees of custom constructors Comparing new code against I binary I cereal I aeson, just for reference Prototype using MessagePack format rather than CBOR
.
Well-Typed .
Encoded file size (MB)
Serialised data size
100
50
0 aeson binary cereal new I I
binary and cereal produce identical files new format is 1/2 size of binary/cereal .
Well-Typed .
Time (milliseconds)
Encode and decode time encode decode 100
50
0 aeson
binary
I
5x faster encode than cereal
I
3x faster decode than binary
new
cereal
.
Well-Typed .
Time (milliseconds)
Encode and decode time encode decode decode + deepseq
100
50
0 aeson
binary
new
cereal
Deepseq issue needs work
.
Well-Typed .
Decoding for large amounts of data
Time (seconds)
2 1.5 1 0.5 0 binary I I
cereal
new
streaming decode to avoid all results in memory at once aeson is off the scale at 25sec .
Well-Typed .
Decoding allocation
Allocation (GB)
8 6 4 2 0 binary I I
cereal
new
dramatically less allocation allocation costs mirror decode time .
Well-Typed .
Optimisation
Existing binary approach We’ll focus on serialisation.
.
Well-Typed .
Existing binary approach We’ll focus on serialisation. Current serialisation uses a Builder monoid newtype Builder = Builder { runBuilder :: (Buffer → IO L.ByteString) → Buffer → IO L.ByteString } data Buffer = Buffer !(ForeignPtr Word8) !Int !Int !Int Continuation style, passing a Buffer around. Performs well in micro-benchmarks and badly for “real world” tree-structured data. .
Well-Typed .
Why it’s slow
Tricky...
.
Well-Typed .
Why it’s slow
Tricky... It generates a lot of code. Writing an optimal code generator is harder than writing optimal code. Every buffer check needs a resume continuation (which duplicates code). Allocates large multi-argument closures (continuations). Allocates many Buffer structures.
.
Well-Typed .
Illustrative example
data TwoInts = TwoInts {-# UNPACK #-} {-# UNPACK #-}
!Int !Int
encodeTwoInts :: TwoInts → Builder encodeTwoInts (TwoInts a b) = word64BE (fromIntegral a) word64BE (fromIntegral b) Simplifying for the example by using the Builder layer, rather than Put writer monad.
.
Well-Typed .
Encoder core (simplified) encodeTwoInts1 = λa b k ptr1 ptr2 s → case (minusAddr # ptr2 ptr1) < # 16 of False → case writeWord8OffAddr # ptr1 0 (uncheckedShiftRL # a 56) s of -- repeat 7 more times for a and 8 for b... s0 → k (BufferRange (plusAddr # ptr1 16) ptr2 s0 )
True → (# s, BufferFull 16 ptr1 $ λbuf → case buf of BufferRange ptr10 ptr20 → case writeWord8OffAddr # ptr10 0 (uncheckedShiftRL # a 56) s of -- repeat 7 more times for a and 8 for b... s0 → k (BufferRange (plusAddr # ptr10 16) ptr20 s0 ) #) .
Well-Typed .
The new trick The trick is to use a deep embedding of the builder actions data FlatTerm = OutWord !Word FlatTerm | OutInt !Int FlatTerm | OutBytes !ByteString FlatTerm ... | OutStreamEnd Even though a deep embedding involves allocation and indirection! Credit to Simon Meier for trying this crazy idea.
.
Well-Typed .
The new trick The trick is to use a deep embedding of the builder actions data FlatTerm = OutWord !Word FlatTerm | OutInt !Int FlatTerm | OutBytes !ByteString FlatTerm ... | OutStreamEnd The Binary instances produce the FlatTerm data structure. newtype Encoding = Encoding (FlatTerm → FlatTerm) class Binary a where encode :: a → Encoding .
Well-Typed .
The new trick The trick is to use a deep embedding of the builder actions data FlatTerm = OutWord !Word FlatTerm | OutInt !Int FlatTerm | OutBytes !ByteString FlatTerm ... | OutStreamEnd Fast “interpreter” function runs through the FlatTerm , doing the format encoding details and writing out to a buffer toBuilder :: Encoding → B.Builder
.
Well-Typed .
Encode core (simplified)
encodeTwoInts1 = λti k → case ti of TwoInts a b → OutListLen 3 (OutWord 1 (OutInt a (OutInt b k)))
Allocates the whole encode description in one go. No messing. All dynamic checks left to the interpreter.
.
Well-Typed .
Encoder interpreter toBuilder :: Encoding → B.Builder toBuilder (Encoding f) → B.builder (step (f OutStreamEnd)) step toks k (BI.BufferRange optr optr_end) = go toks optr where go toks ! optr | optr ‘plusPtr‘ bound 6 optr_end = case toks of OutWord x toks0 → B.runB encWord x optr >>= go toks0 OutFloat x toks0 → B.runB encFloat x optr >>= go toks0 ... | otherwise = return $ B.bufferFull bound op (step toks k) Heavily optimised low-level interpreter loop.
.
Well-Typed .
Decoder deep embedding Also apply deep embedding trick to the decoder newtype Decoder a = Decoder { runDecoder :: ∀r. (a → DecodeAction r) → DecodeAction r } data DecodeAction a = ConsumeToken (TermToken → DecodeAction a) | Fail String | Done a data TermToken = TkWord !Word | TkInt !Int | TkBytes !B.ByteString ...
.
Well-Typed .
Decoder interpreter go_fast :: DecodeAction a → B.ByteString → SlowPath a go_fast (ConsumeToken k) !bs | B.length bs > 9 -- plenty of space = case decodeDispatch (B.unsafeHead bs) bs of RsToken sz tok → go_fast (k tok) (B.unsafeDrop sz bs) RsString sz len → SlowString (B.unsafeDrop sz bs) k len RsBytes sz len → SlowBytes (B.unsafeDrop sz bs) k len RsFail msg → SlowFail bs msg go_fast da !bs = go_fast_end da bs Fast path / slow path optimisation I I I
loop while we still have enough input buffer return reason for falling out of the fast path slow path fixes things up and calls fast path again .
Well-Typed .
Thanks! Questions?
.
Well-Typed .
More decoder stuff...
Decode code
decodeTwoInts :: Decoder TwoInts decodeTwoInts = do expectListOfLen 3 expectTagOf 1 x ← decode y ← decode return (TwoInts x y) Common pattern
.
Well-Typed .
Decode core (simplified) decodeTwoInts1 = λk → let mkTwoInts = ConsumeToken $ λtk1 → case tk1 of TkInt n1 → ConsumeToken $ λtk2 → case tk2 of TkInt n2 → k (TwoInts n1 n2) checkTag = ConsumeToken $ λtk → case tk of TkWord tag → case eqWord # 1 tag of False → . . . True → mkTwoInts in ConsumeToken $ λtk → case tk of TkListLen len → case eqWord # 3 len of False → . . . True → checkTag Has to allocate six closures! .
Well-Typed .
Decode optimisation idea Want to minimise closure allocation decodeTwoInts :: Decoder TwoInts decodeTwoInts = do expectListOfLen 3 expectTagOf 1 x ← decode y ← decode return (TwoInts x y) Necessarily has two closures, due to x and y .
.
Well-Typed .
Decode optimisation idea
Perhaps a stack approach? decodeTwoInts :: Decoder TwoInts decodeTwoInts = do expectListOfLen 3 expectTagOf 1 pushDecode pushDecode transformStack (λx y → TwoInts x y)
.
Well-Typed .