LiquidAI's LFM2.5 230M runs in-browser at 1,400 tok/s via custom WebGPU kernels
In-browser AI hits 1,400 tokens per second on an M4 Max—no server needed.
Deep Dive
LiquidAI's LFM2.5-230M model runs locally in your browser using custom WebGPU kernels written by Fable 5 (before it was shut down) and Opus 4.8. The video was recorded on an M4 Max. Demo available on Hugging Face.
Key Points
- LiquidAI's LFM2.5-230M model achieves 1,400 tokens per second in-browser via custom WebGPU kernels.
- The kernels were written by Fable 5 and Opus 4.8; demo recorded on an M4 Max Mac.
- Model is available in GGUF format on Hugging Face, enabling instant local inference without cloud dependency.
Why It Matters
Shows that powerful, real-time AI can run fully on-device, reducing latency and enhancing privacy for users.