Game Boy Color runs TinyStories-260K transformer natively
260K parameter AI model runs on a 1998 handheld with no internet.
A developer has achieved a remarkable feat of computational minimalism: running a real transformer language model entirely on an unmodified Game Boy Color. Maddie Dreese ported Andrej Karpathy's TinyStories-260K model—a 260K-parameter model originally designed for storytelling tasks—to the GBC's 8-bit Z80-like CPU. The model weights are converted to INT8 precision and stored in bank-switched cartridge ROM (via an EZ Flash Junior flash cart). Because the GBC lacks floating-point hardware, all arithmetic uses fixed-point approximations, leading to considerable accuracy loss. The prompt is entered using the D-pad and buttons via an on-screen keyboard, tokenized on-device, then the ROM runs both transformer prefill and autoregressive generation. The KV cache is stored in cartridge SRAM due to the GBC's tiny 8KB work RAM. Unsurprisingly, inference is extremely slow—seconds per token—and the output is gibberish, but the core forward pass and attention mechanism function correctly.
This project underscores how far AI efficiency has come: a 260K-parameter transformer (roughly equivalent to the smallest GPT-2 variant) can be crammed into a 1998 handheld gaming system with a 4.19 MHz CPU, 32KB RAM, and 64KB addressable ROM space. Dreese used Codex (an AI coding assistant) to help build the ROM, and the source is available on GitHub. While not practically useful, it proves that even the most limited hardware can execute modern transformer architectures with aggressive quantization and careful memory management. The achievement resonates with the retrocomputing and AI-at-the-edge communities, showing that the gap between 'impossible' and 'possible' is often just clever engineering.
- Model: TinyStories-260K, weights converted to INT8 fixed-point math for the GBC's lack of FPU.
- Hardware: Stock Game Boy Color + EZ Flash Junior cartridge; no phone, PC, Wi-Fi, or cloud.
- Performance: Extremely slow (seconds per token) and output is gibberish, but prefill + autoregressive generation works on-device.
Why It Matters
Proves transformer inference is feasible on 25-year-old hardware, pushing the edge of embedded AI.