Open Source

LiquidAI's LFM2.5 230M runs in-browser at 1,400 tok/s via custom WebGPU kernels

In-browser AI hits 1,400 tokens per second on an M4 Max—no server needed.

Deep Dive

LiquidAI's LFM2.5-230M model runs locally in your browser using custom WebGPU kernels written by Fable 5 (before it was shut down) and Opus 4.8. The video was recorded on an M4 Max. Demo available on Hugging Face.

Key Points
  • LiquidAI's LFM2.5-230M model achieves 1,400 tokens per second in-browser via custom WebGPU kernels.
  • The kernels were written by Fable 5 and Opus 4.8; demo recorded on an M4 Max Mac.
  • Model is available in GGUF format on Hugging Face, enabling instant local inference without cloud dependency.

Why It Matters

Shows that powerful, real-time AI can run fully on-device, reducing latency and enhancing privacy for users.

📬 Get the top 10 AI stories daily