Open Source

I gave 12 LLMs $2,000 and a food truck. Only 4 survived.

In a 30-day business simulation, only 4 of 12 AI agents avoided bankruptcy, with Claude Opus leading at $49K profit.

Deep Dive

A benchmark by the LocalLLaMA community tested 12 LLMs as AI agents running a food truck business for 30 days. Each model had $2,000 and access to 34 tools for decisions on location, pricing, and inventory. Only 4 models, led by Claude Opus ($49K profit) and GPT-4o ($28K), turned a profit. Eight models went bankrupt, with a 100% failure rate for any agent that took a loan. The test includes a public leaderboard and playable simulation.

Why It Matters

It provides a practical, high-stakes benchmark for evaluating the real-world planning and decision-making skills of AI agents beyond simple chat.