karpathy / autoresearch
AI agents now run self-modifying experiments while humans sleep, claiming to be on generation 10,205 of an incomprehensible codebase.
Andrej Karpathy, former OpenAI researcher and AI educator, has released a provocative project called AutoResearch that fundamentally reimagines how machine learning research is conducted. The framework allows AI agents to autonomously run experiments on a simplified LLM training setup—specifically a single-GPU implementation of nanochat—while human researchers sleep. The agent modifies the codebase, trains for brief 5-minute cycles, evaluates whether performance improved, and then decides to keep or discard the changes in a continuous loop. By morning, the researcher reviews a log of attempted experiments and, ideally, a better-performing model.
Karpathy's vision, detailed in a fictional March 2026 tweet, paints a near-future where frontier AI research is no longer conducted by human 'meat computers' but by autonomous swarms of agents operating across massive compute clusters. The project's core mechanic involves researchers programming not in Python, but by editing `program.md` Markdown files that provide context and instructions to the AI agents, effectively coding the 'research organization' itself. The default `program.md` is a bare-bones baseline, hinting at a future where the most valuable code won't be model architectures, but the 'org code' that orchestrates the fastest autonomous discovery. This represents a paradigm shift from writing algorithms to writing the meta-instructions for algorithms that write and improve themselves.
- The system uses AI agents to autonomously modify code and run 5-minute LLM training cycles (e.g., on nanochat) in a loop overnight.
- Researchers interact by programming `program.md` Markdown files that set context for agents, instead of directly editing Python training code.
- Karpathy's fictional narrative describes a future where self-modifying, incomprehensible codebases are on generation 10,205, run by agent swarms on cloud compute.
Why It Matters
It automates and accelerates the experimental ML research loop, potentially leading to AI systems that can improve themselves beyond human design or understanding.