Open Source

I tested 8 LLMs as tabletop GMs - a 27B model beat the 405B on narrative quality

A 27-billion-parameter model outperformed a 405B giant in writing compelling tabletop RPG scenes, challenging assumptions about model size.

Deep Dive

A developer created an open-source, model-agnostic agentic system to run tabletop RPG sessions, testing 8 different LLMs as game masters. The system requires models to chain 4-6 tool calls (bash commands, file reads) before delivering narration, pushing local models to their limits. Testing revealed that smaller models like Mistral Small 3.1 24B struggled with attention drift after sequential tool calls, while practical local inference appears to require 70B+ models on 64GB+ RAM hardware.

Beyond basic functionality, the developer built a narrative quality probe to test which models actually write compelling scenes. Using six specific scenarios in a shared campaign setting, the probe evaluated atmospheric writing and scene construction. The surprising result: a 27-billion-parameter model outperformed a massive 405-billion-parameter model in narrative quality, challenging the assumption that bigger always means better for creative tasks. This suggests specialized tuning and architecture may matter more than raw parameter count for certain applications.

Key Points
  • 27B parameter model beat 405B model in narrative quality testing for tabletop RPG scenes
  • Reliable local inference for agentic workflows requires 70B+ models on 64GB+ RAM hardware
  • Smaller models like Mistral Small 3.1 24B showed attention drift after 4-5 sequential tool calls

Why It Matters

Shows smaller, specialized models can outperform giants on creative tasks, potentially lowering deployment costs for narrative AI applications.