Open Source

Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run

A 31B-parameter model costing $0.20 outperforms GPT-5.2 and beats every Chinese open-source model tested.

Deep Dive

Google's new Gemma 4 model, with 31 billion parameters, has delivered a shockingly strong performance on the FoodTruck Bench, a rigorous AI business simulation benchmark. In the test, where an AI agent must run a virtual food truck for 30 days, Gemma 4 achieved a perfect 100% survival rate across five runs, with a staggering median return on investment (ROI) of +1,144%. The most disruptive detail is its cost: a mere $0.20 per simulation run. This performance not only surpasses far more expensive models like OpenAI's GPT-5.2 ($4.43) and Google's own Gemini 3 Pro ($2.95) but also 'absolutely destroys' every major Chinese open-source model tested, including Qwen and DeepSeek variants, which failed to survive consistently.

The benchmark results, which were double-checked for consistency in configuration and prompts, position Gemma 4 as a monumental value proposition for developers. The only model that outperformed it was Anthropic's flagship Claude 3.5 Opus, but at a cost of $36 per run—making it 180 times more expensive than Gemma 4. For professionals building agentic workflows—where AI can take autonomous actions—this represents the best cost-to-performance ratio observed among 22 models tested on the platform. The findings suggest that highly capable, efficient small models are rapidly closing the gap with massive, expensive frontier models, potentially reshaping how teams budget and deploy AI for complex, multi-step tasks.

Key Points
  • Achieved 100% survival and +1,144% median ROI on the FoodTruck Bench business simulation.
  • Costs only $0.20 per run, outperforming GPT-5.2 ($4.43) and beating all tested Chinese open-source models.
  • Only outperformed by Claude 3.5 Opus, which costs 180x more at $36 per run.

Why It Matters

This drastically lowers the cost of running sophisticated AI agents, making advanced automation accessible for more projects and startups.