GeoSym127K includes 127K questions with symbolic ground truths and 55K answer-verified CoT QA pairs?

GeoSym127K includes 127K questions with symbolic ground truths and 55K answer-verified CoT QA pairs

Qwen3-VL-8B gains +22.21% on MathVerse Vision-Only and reaches 61.52% on WeMath (+6.19%)?

Qwen3-VL-8B gains +22.21% on MathVerse Vision-Only and reaches 61.52% on WeMath (+6.19%)

RLVR with GRPO from SFT checkpoints outperforms zero-shot RL, showing robust scaling?

RLVR with GRPO from SFT checkpoints outperforms zero-shot RL, showing robust scaling

Research & Papers

GeoSym127K boosts geometric reasoning by 22% on Qwen3-VL-8B

arXiv cs.CV May 19, 2026

⚡New neuro-symbolic framework cuts visual hallucinations with 127K verified geometry Q&As

Deep Dive

Large Multimodal Models (LMMs) frequently hallucinate on geometric reasoning tasks due to imprecise visual understanding and lack of chain-of-thought data with verifiable ground truth. To close this gap, a team of researchers from multiple institutions developed the GeoSym Engine — a scalable neuro-symbolic framework that combines type-conditional grammar with an analytic SymGT solver. The engine produces exact symbolic ground truths and seamlessly renders high-precision diagrams, eliminating the noise from hand-labeled datasets.

Using this engine, they created GeoSym127K, a difficulty-stratified dataset containing 51K images, 127K symbolic-grounded questions, and 55K answer-verified chain-of-thought pairs. Fine-tuning Qwen3-VL-8B on this data drove a concentrated +22.21% absolute improvement on the MathVerse Vision-Only subset and pushed WeMath accuracy to 61.52% (a +6.19% gain), outperforming proprietary models like Doubao-1.8. Additionally, applying reinforcement learning with verifiable rewards (RLVR via GRPO) from SFT-initialized checkpoints demonstrated significantly higher performance ceilings than zero-shot RL, showcasing the scaling potential of verifiable reasoning synthesis.

Key Points

GeoSym127K includes 127K questions with symbolic ground truths and 55K answer-verified CoT QA pairs
Qwen3-VL-8B gains +22.21% on MathVerse Vision-Only and reaches 61.52% on WeMath (+6.19%)
RLVR with GRPO from SFT checkpoints outperforms zero-shot RL, showing robust scaling

Why It Matters

Symbolically-verifiable geometric data could close the reliability gap in multimodal AI reasoning for STEM education and robotics.

Read Original Article

GeoSym127K boosts geometric reasoning by 22% on Qwen3-VL-8B

Why It Matters

Related Articles

🚀 Stay Ahead in AI