Media & Culture

GPT-5.4 Thinking benchmarks

r/Singularity March 06, 2026

⚡The new model reportedly scores 92% on MMLU and 85% on GPQA, outperforming its predecessor.

Deep Dive

A viral leak suggests OpenAI has internally tested a new model, GPT-5.4, which has achieved record-breaking scores on major AI reasoning benchmarks. The reported results, shared on social media, indicate the model scored 92% on the MMLU (Massive Multitask Language Understanding) benchmark and 85% on the notoriously difficult GPQA (Graduate-Level Google-Proof Q&A) benchmark. These scores, if accurate, would represent a clear and substantial performance jump over the current flagship model, GPT-4o, particularly in areas requiring deep, multi-step reasoning and expert-level knowledge. The leak has ignited speculation about the model's architecture and its imminent public release.

While OpenAI has not officially confirmed GPT-5.4's existence or these specific results, the benchmarks cited are critical industry standards for evaluating AI reasoning. MMLU tests broad knowledge across 57 subjects, while GPQA is a diamond-standard benchmark for scientific reasoning, designed to be extremely challenging even for human experts. A score of 85% on GPQA would indicate a model capable of advanced scientific and technical problem-solving. The immediate implication is a new tier of AI assistant for complex R&D, data science, and academic research. The AI community is now watching for an official announcement, which would likely detail the model's capabilities, context window, and multimodal features.

Key Points

Reported 92% score on MMLU benchmark, indicating superior broad knowledge and understanding.
Alleged 85% score on the expert-level GPQA benchmark, suggesting breakthrough scientific reasoning.
Performance reportedly surpasses GPT-4o, signaling a major step in AI reasoning capability.

Why It Matters

Advances in reasoning benchmarks directly translate to more reliable AI for complex analysis, research, and technical problem-solving.

Read Original Article

GPT-5.4 Thinking benchmarks

Why It Matters

Stay Ahead in AI