Hybrid model with 8B total parameters, only 1B active per token (A1B) for extreme efficiency?

Hybrid model with 8B total parameters, only 1B active per token (A1B) for extreme efficiency

Competitive with larger dense and MoE models on instruction following and agentic tasks?

Competitive with larger dense and MoE models on instruction following and agentic tasks

Supports llama.cpp, MLX, vLLM, and SGLang from day one for CPU and GPU inference?

Supports llama.cpp, MLX, vLLM, and SGLang from day one for CPU and GPU inference

Open Source

LiquidAI's LFM2.5 hybrid model runs on any potato device

r/LocalLLaMA May 29, 2026

⚡Only 1B active parameters per token, matches larger models on agentic tasks

Deep Dive

LiquidAI released LFM2.5-8B-A1B, a new hybrid model family optimized for on-device AI. Built on the LFM2 architecture with extended pre-training and reinforcement learning, the model uses 8B total parameters but activates only 1B per token (A1B). This sparse design enables remarkable efficiency: it can run on CPUs and GPUs, including on consumer hardware like a "potato" laptop or phone. Despite its small active footprint, LFM2.5 competes with much larger dense and mixture-of-experts (MoE) models on benchmarks for instruction following and agentic tasks.

The model supports day-one integration with popular inference engines: llama.cpp for CPU, MLX for Apple Silicon, vLLM for high-throughput serving, and SGLang for flexible deployment. LiquidAI emphasizes that LFM2.5 is built for real-life applications like on-device personal assistants that chain tool calls and follow complex instructions. The unmatched throughput in its size class makes it ideal for edge devices, reducing cloud dependency and latency. Early benchmarks show it outperforming larger models in agentic reasoning while maintaining low resource usage. This release signals a shift toward practical, powerful AI that runs locally on any device.

Key Points

Hybrid model with 8B total parameters, only 1B active per token (A1B) for extreme efficiency
Competitive with larger dense and MoE models on instruction following and agentic tasks
Supports llama.cpp, MLX, vLLM, and SGLang from day one for CPU and GPU inference

Why It Matters

Enables powerful AI assistants on edge devices without cloud dependency, unlocking real-time, private agentic AI.

Read Original Article

LiquidAI's LFM2.5 hybrid model runs on any potato device

Why It Matters

Related Articles

🚀 Stay Ahead in AI