Research & Papers

GeoSym127K boosts geometric reasoning by 22% on Qwen3-VL-8B

New neuro-symbolic framework cuts visual hallucinations with 127K verified geometry Q&As

Deep Dive

Large Multimodal Models (LMMs) frequently hallucinate on geometric reasoning tasks due to imprecise visual understanding and lack of chain-of-thought data with verifiable ground truth. To close this gap, a team of researchers from multiple institutions developed the GeoSym Engine — a scalable neuro-symbolic framework that combines type-conditional grammar with an analytic SymGT solver. The engine produces exact symbolic ground truths and seamlessly renders high-precision diagrams, eliminating the noise from hand-labeled datasets.

Using this engine, they created GeoSym127K, a difficulty-stratified dataset containing 51K images, 127K symbolic-grounded questions, and 55K answer-verified chain-of-thought pairs. Fine-tuning Qwen3-VL-8B on this data drove a concentrated +22.21% absolute improvement on the MathVerse Vision-Only subset and pushed WeMath accuracy to 61.52% (a +6.19% gain), outperforming proprietary models like Doubao-1.8. Additionally, applying reinforcement learning with verifiable rewards (RLVR via GRPO) from SFT-initialized checkpoints demonstrated significantly higher performance ceilings than zero-shot RL, showcasing the scaling potential of verifiable reasoning synthesis.

Key Points
  • GeoSym127K includes 127K questions with symbolic ground truths and 55K answer-verified CoT QA pairs
  • Qwen3-VL-8B gains +22.21% on MathVerse Vision-Only and reaches 61.52% on WeMath (+6.19%)
  • RLVR with GRPO from SFT checkpoints outperforms zero-shot RL, showing robust scaling

Why It Matters

Symbolically-verifiable geometric data could close the reliability gap in multimodal AI reasoning for STEM education and robotics.