Learning Reasoning World Models for Parallel Code
7B model boosts race prediction accuracy by 8.5%, rivals costly tool calls...
Large language models excel at serial code generation but struggle with parallel code due to scarce training data. A common workaround uses coding agents that interact with external tools, but these tool calls are costly and often impractical for partially written code. Researchers from MIT, Georgia Tech, and Lawrence Livermore National Laboratory propose Parallel-Code World Models (PCWMs), reasoning LLMs trained to predict tool outcomes—like data races and performance profiles—directly from parallel source code, eliminating the need for external tools.
To train PCWMs, the team built a novel exploration pipeline that generates diverse parallel-coding problems and candidate implementations across multiple domains, then executes them via tools to record data races and performance profiles. From these, they synthesized hindsight reasoning traces that causally connect source code to observed tool outcomes. Fine-tuning a 7B model on this data improved race-outcome prediction accuracy from 64.3% to 72.8%, while an 8B model boosted performance profiling accuracy from 49.3% to 58.6%. When open-weight models were tasked with fixing data races, world-model feedback improved their race-fixing rates by 2.7%-9.1% using the 7B model and by 6.1%-11.1% using a 14B model, outperforming self-feedback approaches.
- 7B PCWM model improves race-outcome prediction from 64.3% to 72.8% accuracy
- 8B model boosts performance profiling accuracy from 49.3% to 58.6%
- World-model feedback improves race-fixing rates by up to 11.1% over self-feedback
Why It Matters
PCWMs could replace costly tool calls in parallel-coding agents, accelerating debugging and optimization for multi-threaded software.