Achieved 100% survival and +1,144% median ROI on the FoodTruck Bench business simulation?

Achieved 100% survival and +1,144% median ROI on the FoodTruck Bench business simulation.

Costs only $0.20 per run, outperforming GPT-5.2 ($4.43) and beating all tested Chinese open-source models?

Costs only $0.20 per run, outperforming GPT-5.2 ($4.43) and beating all tested Chinese open-source models.

Only outperformed by Claude 3.5 Opus, which costs 180x more at $36 per run?

Only outperformed by Claude 3.5 Opus, which costs 180x more at $36 per run.

Open Source

Google's Gemma 4 (31B) dominates AI benchmarks at just $0.20 per run

r/LocalLLaMA April 06, 2026

⚡A 31B-parameter model costing $0.20 outperforms GPT-5.2 and beats every Chinese open-source model tested.

Deep Dive

Google's new Gemma 4 model, with 31 billion parameters, has delivered a shockingly strong performance on the FoodTruck Bench, a rigorous AI business simulation benchmark. In the test, where an AI agent must run a virtual food truck for 30 days, Gemma 4 achieved a perfect 100% survival rate across five runs, with a staggering median return on investment (ROI) of +1,144%. The most disruptive detail is its cost: a mere $0.20 per simulation run. This performance not only surpasses far more expensive models like OpenAI's GPT-5.2 ($4.43) and Google's own Gemini 3 Pro ($2.95) but also 'absolutely destroys' every major Chinese open-source model tested, including Qwen and DeepSeek variants, which failed to survive consistently.

The benchmark results, which were double-checked for consistency in configuration and prompts, position Gemma 4 as a monumental value proposition for developers. The only model that outperformed it was Anthropic's flagship Claude 3.5 Opus, but at a cost of $36 per run—making it 180 times more expensive than Gemma 4. For professionals building agentic workflows—where AI can take autonomous actions—this represents the best cost-to-performance ratio observed among 22 models tested on the platform. The findings suggest that highly capable, efficient small models are rapidly closing the gap with massive, expensive frontier models, potentially reshaping how teams budget and deploy AI for complex, multi-step tasks.

Key Points

Achieved 100% survival and +1,144% median ROI on the FoodTruck Bench business simulation.
Costs only $0.20 per run, outperforming GPT-5.2 ($4.43) and beating all tested Chinese open-source models.
Only outperformed by Claude 3.5 Opus, which costs 180x more at $36 per run.

Why It Matters

This drastically lowers the cost of running sophisticated AI agents, making advanced automation accessible for more projects and startups.

Read Original Article

Google's Gemma 4 (31B) dominates AI benchmarks at just $0.20 per run

Why It Matters

Related Articles

🚀 Stay Ahead in AI