260K-parameter LLM runs on an emulated 1993 Freescale ColdFire MCF5307 (68K derivative) with 16MB RAM?

260K-parameter LLM runs on an emulated 1993 Freescale ColdFire MCF5307 (68K derivative) with 16MB RAM.

INT8 quantization, Carmack's fast inverse square root, and RoPE lookup tables avoid FPU emulation bottlenecks?

INT8 quantization, Carmack's fast inverse square root, and RoPE lookup tables avoid FPU emulation bottlenecks.

Outputs 2–4 seconds per token, generating TinyStories-style English; available as a live browser demo?

Outputs 2–4 seconds per token, generating TinyStories-style English; available as a live browser demo.

Open Source

MironV runs 260K-param LLM on emulated 1990s CPU in an 18-year-old RTOS

r/LocalLLaMA May 28, 2026

⚡A tiny LLM outputs 2–4 seconds per token on a ColdFire MCF5307 emulator.

Deep Dive

Developer MironV has revived a university project from 2008—a custom RTOS for the Freescale ColdFire MCF5307 (a Motorola 68K derivative)—by building a JavaScript CPU emulator from scratch, reverse-engineering the original ROM using Claude and Qwen, and then pushing the stack to its limits by running a small LLM. Using Karpathy’s llama2.c with the 260K-parameter stories260K model (trained on TinyStories), he squeezed the ~500KB of weights into 16MB of emulated memory by shrinking the kernel stack.

To overcome the ColdFire’s lack of a floating-point unit, MironV quantized the model to INT8 with per-row scaling, turning matrix multiplications into pure integer math. For remaining float operations, he employed Carmack’s fast inverse square root (from Quake) and lookup tables for RoPE (rotary position embeddings), reserving emulated floating point only for infrequent softmax/RMSNorm steps. The result: 2–4 seconds per token, generating mostly coherent TinyStories-style English. The whole project runs in a browser and is open-source. Next, MironV plans to move the stack to an FPGA for real-time speeds.

Key Points

260K-parameter LLM runs on an emulated 1993 Freescale ColdFire MCF5307 (68K derivative) with 16MB RAM.
INT8 quantization, Carmack's fast inverse square root, and RoPE lookup tables avoid FPU emulation bottlenecks.
Outputs 2–4 seconds per token, generating TinyStories-style English; available as a live browser demo.

Why It Matters

Demonstrates extreme on-device inference optimization, hinting at LLM viability on resource-constrained hardware.

Read Original Article

MironV runs 260K-param LLM on emulated 1990s CPU in an 18-year-old RTOS

Why It Matters

Related Articles

🚀 Stay Ahead in AI