Deflation: Cost to train A.I. models drops 40% per year - Karpathy
The era of cheap AI is here—here's what actually works and what doesn't.
Deep Dive
Andrej Karpathy reports the cost to train models like GPT-2 is falling by approximately 40% each year, a trend he believes is an underestimate. Major gains come from hardware (H100), software (Flash Attention 3), algorithms (Muon optimizer), and data (FineWeb-edu). Key breakthroughs include Flash Attention 3 (9% speedup), a new optimal token-to-parameter ratio of ~10, and eliminating wasteful practices. However, multi-token prediction and FP8 for lm_head failed to deliver.
Why It Matters
This rapid deflation makes powerful AI models dramatically more accessible, lowering the barrier to entry for startups and researchers.