Liquid AI releases LFM2-24B-A2B
New sparse MoE model packs 24B parameters but only activates 2.3B per token for efficient local AI.
Liquid AI has launched LFM2-24B-A2B, marking a significant scaling milestone for their LFM2 architecture. This 24-billion-parameter model represents the largest release in their family, expanding from their previous 350M parameter models and demonstrating predictable, log-linear quality improvements across nearly two orders of magnitude. The model maintains a hybrid convolutional and grouped-query attention (GQA) design within a sparse Mixture-of-Experts (MoE) framework, featuring 40 layers with 64 experts per MoE block and top-4 routing.
Technically, LFM2-24B-A2B's key innovation is its efficient parameter utilization: while containing 24 billion total parameters, it activates only 2.3 billion parameters per forward pass. This design choice concentrates capacity in total parameters rather than active compute, keeping inference latency and energy consumption manageable. The model is specifically engineered to operate within 32GB of RAM, enabling deployment on high-end consumer laptops and desktops without specialized hardware.
Contextually, this release challenges the assumption that larger models require exponentially more compute. Liquid AI's architecture demonstrates that quality can scale predictably without inflating per-token computational costs. The model shows strong performance across benchmarks including GPQA Diamond, MMLU-Pro, IFEval, IFBench, GSM8K, and MATH-500, confirming the LFM2 architecture doesn't plateau at small sizes.
Practical implications include immediate availability as an open-weight instruct model on Hugging Face with day-zero support for popular inference frameworks like llama.cpp, vLLM, and SGLang. Multiple GGUF quantizations are available, making it accessible for developers and researchers. This represents a significant step toward democratizing powerful AI that can run locally, reducing dependency on cloud infrastructure while maintaining competitive performance.
- 24B total parameters with only 2.3B active per token via sparse MoE architecture
- Designed to run within 32GB RAM for deployment on consumer laptops and desktops
- Shows log-linear quality improvements scaling from 350M to 24B parameters across major benchmarks
Why It Matters
Enables powerful 24B-parameter AI models to run locally on consumer hardware, reducing cloud dependency and costs.