Open Source

Hy3 preview tops leaderboard with 87.8 on CHSBO 2025, beating GPT and Gemini

Frontier reasoning models are multiplying faster than version numbers can keep up.

Deep Dive

The frontier reasoning model race is accelerating, with a new entrant called Hy3 preview claiming the top spot on the CHSBO 2025 benchmark with a score of 87.8. This surpasses both GPT5.4 xhigh and Gemini 3.1 Pro, shaking up the leaderboard and signaling that the competition is far from settled. The post from user ExoticYesterday8282 on a tech forum highlights the confusion around version numbers as models proliferate faster than developers can evaluate them.

The key question raised is whether Hy3 preview's benchmark success translates to real-world performance in coding and mathematics tasks, or if it is simply a result of benchmark hardening—where models are over-optimized for specific test sets. As the reasoning race heats up, practitioners must balance excitement over leaderboard gains with rigorous validation in production environments.

Key Points
  • Hy3 preview scored 87.8 on CHSBO 2025, surpassing GPT5.4 xhigh and Gemini 3.1 Pro.
  • The rapid pace of new model releases is creating version fatigue among developers.
  • Skepticism remains whether Hy3's benchmark gains reflect real-world coding and math ability or benchmark hardening.

Why It Matters

Developers must rapidly evaluate new models without falling for benchmark hype that may not reflect real-world reasoning ability.