AI Safety

Coordinal Research's postmortem: AI safety startup fails despite $125K, SOTA RE-Bench gains

A $125K funded AI safety platform shuts down after two cofounder splits and a failed app demo.

Deep Dive

Coordinal Research, founded by a MATS 6.0 alumnus, aimed to automate AI safety research: a researcher would write 'Replicate X result from paper Y with tweak Z', and the system would provision sandboxed compute, write code, run experiments, and produce a report. After multiple grant rejections, a previously pitched funder returned via a coworking connection, closing $125K on an MFN SAFE in April 2025. The startup incorporated as a Delaware C-corp, joined the 50/50 accelerator, but endured two cofounder splits (first with Jacques in October 2025, then with Leo in January 2026), consuming significant administrative energy.

In Q1 2026, the team made a final push on two fronts: shipping the user-facing app at coordinal.org/app and demonstrating SOTA on RE-Bench. The RE-Bench work succeeded—normalized average improved from 0.547 to 1.624 over a month using ~$30K of compute, with 6/7 tasks reliably producing non-reward-hacked results using Sonnet 4. However, a friend's failure to navigate the app interface revealed the product was far from shareable. Burnout set in, and when Coefficient Giving declined the requested $1M budget, the founder decided to stop. The postmortem notes that most platform/tooling-shaped safety work gets built faster by frontier labs (e.g., Claude Code absorbed an early scaffold), and for-profit AI safety startups often face pressure to become security middleware.

Key Points
  • Coordinal raised $125K on an MFN SAFE from a funder who returned via a coworking connection.
  • RE-Bench normalized average improved from 0.547 to 1.624 over a month with ~$30K of compute and Sonnet 4, achieving 6/7 reliable non-reward-hacked tasks.
  • The app demo failed (a friend couldn't use the interface), leading to burnout and shutdown after a $1M grant was declined.

Why It Matters

Startup failure reveals structural hurdles for independent AI safety research against frontier labs.