Mistral AI Introduces Mistral Small 4, Unifying Reasoning, Multimodal, and Agentic Coding Capabilities
One model now handles reasoning, vision, and coding agents with 3x throughput
Mistral AI today announced Mistral Small 4, a hybrid Mixture-of-Experts (MoE) model that unifies the capabilities of three previous flagship models: Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding). With 119B total parameters and only 6B active per token (8B including embeddings), the model uses 128 experts with 4 active per token, enabling efficient scaling and specialization. It supports a 256K context window for long-form interactions and native text and image inputs, making it suitable for tasks from document parsing to visual analysis.
A key innovation is the configurable reasoning_effort parameter, letting users toggle between fast, low-latency responses (reasoning_effort="none") and deep, step-by-step reasoning (reasoning_effort="high"). Performance improvements include a 40% reduction in end-to-end completion time and 3x more requests per second compared to Mistral Small 3. On benchmarks like AIME 2025 and LiveCodeBench, Mistral Small 4 matches or surpasses GPT-OSS 120B while generating 20-50% shorter outputs, reducing latency and inference costs. The model is released under Apache 2.0 and optimized for deployment on 4x NVIDIA HGX H100 or 2x DGX B200, with support for vLLM, SGLang, llama.cpp, and Transformers.
- 119B total parameters, 6B active per token, with 128 MoE experts (4 active per token)
- Configurable reasoning effort: toggle between fast chat and deep reasoning with a single parameter
- 3x more requests per second and 40% lower latency vs Mistral Small 3, beating GPT-OSS 120B on benchmarks
Why It Matters
One open-source model now replaces three specialized ones, cutting infrastructure costs and latency for enterprises.