AI Safety

PhysicsWallah's Aryabhata 2 uses RL to ace STEM exams with 64% fewer tokens

New AI solves JEE and NEET problems using 64% fewer tokens than its base model

Deep Dive

Researchers from PhysicsWallah have introduced Aryabhata 2, a specialized reasoning model for competitive STEM examinations such as JEE and NEET. The model is built by post-training GPT-OSS-20B using reinforcement learning with verifiable rewards, leveraging PhysicsWallah's extensive internal question banks. Training combined prolonged reinforcement learning with broadened exploration via progressively larger rollout group sizes, aiming to improve multi-step symbolic reasoning, precise numerical computation, and conceptual understanding across physics, chemistry, and mathematics.

Aryabhata 2 was evaluated on multiple benchmarks including JEE Main, JEE Advanced, NEET, AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. Results show it outperforms its base model GPT-OSS-20B on competitive STEM reasoning while requiring substantially fewer output tokens—up to 64% fewer. This efficiency is critical for scaling AI tutoring to millions of student doubts, making high-quality STEM exam preparation more accessible and cost-effective.

Key Points
  • Outperforms GPT-OSS-20B on JEE, NEET, AIME, GPQA, and other STEM benchmarks
  • Uses up to 64% fewer output tokens than the base model, improving efficiency
  • Trained on PhysicsWallah's internal question banks via RL with verifiable rewards

Why It Matters

Democratizes STEM exam prep with efficient, scalable AI tutoring for millions of students