Developer MaximeCB builds Rust-native CPU-only LFM2.5-8B-A1B running at 37 tok/s
8B parameter model runs entirely on CPU, uses ~7GB RAM, decode at 37 tokens/s
Developer maximecb published a Rust-native language model implementation as a cargo crate, still a work in progress. On a Ryzen 7950x, decode speed is nearly 37 tokens/s with memory usage around 7GB, fitting comfortably on a machine with 16GB RAM. It includes callbacks for tool use, allows weight reuse across Agent instances with separate KV caches, and supports cloning Agent objects to avoid repeating prompt prefill. The prefill speed is not yet optimized—currently similar to decode—but the developer is working on speeding it up.
- Decode speed of ~37 tokens/s on Ryzen 7950X, no GPU required
- Memory usage fits in ~7GB, runs on any 16GB RAM computer
- Supports weight sharing across agents and cloning to skip repeated prefill work
Why It Matters
Enables local, CPU-only inference of large models, democratizing access to powerful AI without expensive GPUs.