StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing
New lossless compressor uses Mamba SSM and n-gram mixing to beat xz without GPU
StateSMix introduces a novel approach to online lossless compression by coupling a Mamba-style State Space Model (SSM) with sparse n‑gram context mixing and arithmetic coding. The system is fully self-contained: it trains token-by-token on the file being compressed, requiring no pre-trained weights, GPU, or external dependencies. The SSM (with a depth multiplier of 32 and 2 layers, totaling about 120,000 active parameters per file) provides continuously updated probability estimates over BPE tokens. Meanwhile, nine sparse n‑gram hash tables (from bigram to 32‑gram, each with 16 million slots) contribute exact local and long-range pattern memorization through a softmax‑invariant logit‑bias mechanism that updates only non‑zero‑count tokens. An entropy‑adaptive scaling mechanism modulates the n‑gram contribution based on the SSM’s predictive confidence to prevent over‑correction.
On the standard enwik8 benchmark, StateSMix achieves 2.123 bits per byte (bpb) on 1 MB, 2.149 bpb on 3 MB, and 2.162 bpb on 10 MB, outperforming xz -9e (LZMA2) by 8.7%, 5.4%, and 0.7% respectively. Ablation experiments show the SSM alone reduces size by 46.6% over a frequency‑count baseline and beats xz without any n‑gram component, while the n‑gram tables add a complementary 4.1% gain through exact context memorization. The implementation is in pure C with AVX2 SIMD, and OpenMP parallelization yields a 1.9× speedup on 4 cores, processing approximately 2,000 tokens per second on commodity x86‑64 hardware.
- StateSMix combines an online‑trained Mamba SSM (120K parameters) with sparse n‑gram hash tables and arithmetic coding to achieve lossless compression without GPUs or pre‑trained weights.
- On enwik8, it beats xz -9e by up to 8.7% (1 MB) and 5.4% (3 MB); the SSM alone accounts for 46.6% size reduction over baseline.
- Pure C implementation with AVX2 SIMD runs at ~2,000 tokens/second on CPU; OpenMP parallelization provides 1.9× speedup on 4 cores.
Why It Matters
StateSMix shows modern SSMs can outperform traditional compression algorithms on CPU, offering a practical path to better file compression without specialized hardware.