OpenAI's GPT-5.4 mini and nano launch - with near flagship performance at much lower cost
The new GPT-5.4 mini runs over 2x faster than GPT-5 mini while approaching flagship model capabilities.
OpenAI has released two new, smaller language models: GPT-5.4 mini and GPT-5.4 nano. These models are engineered for fast, efficient, high-volume AI workloads where latency is critical, such as responsive coding assistants, subagents that complete supporting tasks, and real-time multimodal applications. The GPT-5.4 mini runs more than twice as fast as the previous GPT-5 mini and shows significant performance gains across key benchmarks, including a 54.38% score on SWE-bench Pro (coding) and 72.13% on OSWorld-Verified (computer use), indicating a major leap in capability for its size.
Benchmark results reveal the GPT-5.4 mini's performance is surprisingly close to the flagship GPT-5.4 model. For instance, it scores 88.01% on the GPQA Diamond benchmark, approaching the flagship's 93.00%. This signals a strategic shift where smaller, more cost-effective models can power complex professional tasks like document analysis for finance and law, as noted by Hebbia's CTO. The even smaller GPT-5.4 nano model, while less powerful, still outperforms the old GPT-5 mini on tasks like classification and extraction, offering a new tier for simpler, high-speed operations.
The launch follows a recent flurry of OpenAI model releases, including the high-performance GPT-5.4 Thinking and the conversational GPT-5.3 Instant. The new mini and nano models fill a crucial niche, enabling developers to architect AI systems that mix a large, expensive planning model with numerous cheaper, faster subagents. This approach optimizes both cost and user experience, making advanced AI more accessible for real-time, interactive applications.
- GPT-5.4 mini runs over 2x faster than GPT-5 mini and scores 54.38% on SWE-bench Pro, a 19% improvement.
- The model approaches flagship performance, scoring 88.01% on GPQA Diamond vs. GPT-5.4's 93.00%.
- Designed for latency-sensitive workloads like coding assistants, subagents, and real-time multimodal applications.
Why It Matters
Enables developers to build faster, cheaper AI applications with near-top-tier performance, making advanced AI agents more viable for real-world use.