MindLoom framework generates frontier-level reasoning data with thought mode engineering
New method decomposes solutions into atomic thought modes to synthesize diverse, hard problems.
Generating high-quality reasoning data for LLMs remains a challenge, as existing methods struggle with narrow diversity and unstable difficulty control. In a new arXiv paper, researchers from Peking University introduce MindLoom, a framework that treats the difficulty of reasoning problems as the accumulation of atomic knowledge-reasoning transformations called 'thought modes.' Given a set of hard problems with verified solutions, MindLoom first decomposes those solutions into thought mode chains, revealing the construction logic behind each problem. It then trains a retrieval model that matches problem states to compatible thought modes, providing guidance on which reasoning challenges to introduce during synthesis. New problems are composed by iteratively applying retrieved thought modes to seed questions, with distribution-aligned sampling to encourage diverse reasoning coverage. A rollout-based judging stage labels generated questions by difficulty and supplies judged-correct responses for supervised fine-tuning.
MindLoom was evaluated on nine benchmarks spanning five STEM disciplines (including mathematics, physics, and computer science) and four mathematical reasoning tasks across multiple model families and sizes. Models fine-tuned on MindLoom-generated data achieved favorable performance over base models, distillation, and external-data baselines. Ablation studies confirmed the contribution of each component, and further analysis showed MindLoom covers a broad range of reasoning patterns while maintaining useful difficulty control. The authors have open-sourced their implementation, making it accessible for further research. This work provides a principled approach to scaling up diverse, controllable reasoning data, addressing a key bottleneck in improving LLM reasoning capabilities.
- MindLoom decomposes hard problem solutions into atomic 'thought mode' chains to reveal construction logic.
- A retrieval model matches problem states to compatible thought modes for guided data synthesis.
- Fine-tuned models outperformed baselines across 9 benchmarks covering 5 STEM disciplines and 4 math tasks.
Why It Matters
Enables scalable, controllable generation of diverse reasoning data to improve LLM fine-tuning and frontier performance.