Simplicity Scales
New serialization format eliminates CPU bottlenecks, achieving 86% of peak memory bandwidth on large records.
Researchers Andrew Sampson and Ronny Chan from the 6OVER3 Institute, with Yuta Saito from GoodNotes, have published a paper titled "Simplicity Scales" introducing a groundbreaking serialization format called Bebop. The core innovation is its use of fixed-width encoding: a 32-bit integer is always four bytes, a float is always eight. This eliminates the conditional logic and byte-by-byte inspection required by variable-length formats like Protocol Buffers and JSON, which stall modern CPU pipelines. The result is decoding that's essentially a direct memory read.
Benchmark results are staggering. Across 19 decode workloads, Bebop outperformed Protocol Buffers by 9x to 213x. On a 1536-dimension embedding vector—common in AI applications—Bebop decoded in 2.8 nanoseconds versus 111 nanoseconds for Protocol Buffers and 4.69 microseconds for the highly optimized simdjson, a 1,675x speedup. For large records above 64 KB, the decoder achieves 86% of peak memory bandwidth, meaning the CPU is no longer the bottleneck.
The paper also presents a transport-agnostic RPC protocol built on the Bebop wire format. This protocol introduces 'batch pipelining,' where dependent calls across services can execute in a single round trip with server-side dependency resolution. Critically, it can deploy over HTTP/1.1, HTTP/2, and binary transports without proxies. This removes the strict HTTP/2 requirement that has limited the adoption of gRPC in serverless platforms and web browsers, potentially unlocking new architectures for distributed systems.
- Bebop uses fixed-width encoding, making decoding a single memory read with no conditionals, unlike variable-length formats.
- Achieved 1,675x faster decoding than simdjson on a 1536-dim vector and 86% of peak memory bandwidth on large records.
- Includes a new RPC protocol with batch pipelining that works without HTTP/2, solving gRPC's serverless and browser limitations.
Why It Matters
This could dramatically reduce latency and cost for data-intensive applications like AI inference, real-time analytics, and microservices communication.