Open Source

Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.

A new method flips just 0.007% of binary weights to fix model errors, creating 1KB patches that apply in microseconds.

Deep Dive

Developer Nikshepsvn has introduced Bankai, a groundbreaking post-training adaptation technique specifically designed for true 1-bit Large Language Models (LLMs) like PrismML's Bonsai 8B. Unlike ternary models (e.g., BitNet b1.58), Bonsai uses pure binary weights where each parameter is literally a 0 or 1. Bankai exploits this by treating model adaptation as a search for an optimal XOR mask—a set of specific bits to flip. The method iteratively tests flipping rows of weights, checking if performance improves on a target task (like solving calculus or prime number problems) without degrading general capabilities. The resulting 'patch' is incredibly sparse; one successful experiment required flipping only 93 rows, affecting a mere 0.007% of the model's weights and resulting in a patch file of about 1 kilobyte.

This approach has significant deployment advantages over existing methods like LoRA (Low-Rank Adaptation). While a LoRA adapter can be around 100MB and adds computational overhead during inference, a Bankai patch applies in microseconds and introduces zero runtime cost because the patched weights become the new model. The research found that these tiny patches can effectively correct specific failures—transforming incorrect answers like 'd/dx [x^7 + x] = 0' to the correct '7x^6 + 1'. Crucially, patches trained on a diverse set of 60 'probe' tasks showed an ability to generalize to unseen problems. The entire toolkit is open-source, allowing the described eight experiments to be reproduced in under two hours on an Apple Silicon Mac, democratizing access to this novel form of model editing.

Key Points
  • Creates ~1KB patches by flipping an average of 0.007% of a model's binary weights, fixing errors like prime number identification.
  • Adds zero inference latency—patches apply in microseconds and modify the model weights directly, unlike 100MB LoRA adapters.
  • Only works on true 1-bit LLMs (e.g., Bonsai 8B) because XOR operations on ternary weight encodings produce invalid states.

Why It Matters

Enables ultra-efficient, on-the-fly specialization of compact AI models on edge devices like phones, with minimal storage and no performance penalty.