The Neural Feed Article

RealMath-Eval reveals LLM judges fail at real student math grading

🗃 Research & Papers ⚡ AI News

Researchers introduced RealMath-Eval, a benchmark of 224 real high school exam responses. They found that even top LLM judges score poorly (MSE ~2.96) compared

📖 Read Full Article