Open Source

Karpathy's MicroGPT running at 50,000 tps on an FPGA

r/LocalLLaMA May 03, 2026

⚡A tiny 4,192-parameter model runs at 50,000 tokens per second using onboard ROM.

Deep Dive

Sure, it's only 4,192 parameters, but it's a start. A new FPGA project stores weights onboard for speed, while currently 16‑bit FPGAs max out at 20–30 million parameters. This and Taalas might push for more onboard ROM or dedicated SLM FPGAs. Full code and write‑up at GitHub (TALOS‑V2) and talos.wtf.

Key Points

Runs at 50,000 tokens per second on an FPGA with only 4,192 parameters.
Weights stored in onboard ROM eliminate external memory latency, enabling extreme throughput.
Current FPGA technology limits onboard ROM to ~20–30 million 16-bit parameters, signaling room for future dedicated SLM accelerators.

Why It Matters

Demonstrates that ultra-small models on specialized FPGAs can deliver unprecedented speed, hinting at a new class of low-power, real-time AI accelerators.

Read Original Article

Karpathy's MicroGPT running at 50,000 tps on an FPGA

Why It Matters

Stay Ahead in AI