Models & Releases

GPT-5.2 vs Gemini 3: OpenAI's new model edges Gemini on benchmarks but loses visitors

OpenAI launches GPT-5.2 amid leaked 'code red' memo as visitors drop 6%

Deep Dive

OpenAI's GPT-5.2 arrives amid intense AI rivalry with Google's Gemini 3, which launched two weeks ago and has already caused a reported 6% drop in OpenAI's web traffic. Benchmark results show a close race: GPT-5.2 scores higher on SWE-bench (80% vs 76.2%), GPQA Diamond (92.4% vs 91.9%), and AIME 2025 without tools (100% vs 95%), while Gemini 3 leads on HLE (37.5% vs 34.5%) and MMMLU (91.8% vs 89.6%). Both models excel in different reasoning domains, with GPT-5.2 emphasizing professional knowledge work like spreadsheets, presentations, and complex multi-step projects.

On LMArena, GPT-5.2-high currently holds second place for web development behind Claude Opus 4.5, with Gemini 3 Pro in fourth and the base GPT-5.2 in sixth. However, Gemini 3 dominates overall leaderboards for text, vision, image edit, and search, while GPT-5.2 remains unranked overall. In features, Gemini offers integrated image and video generation, while ChatGPT requires separate use of Sora for videos. Both are accessible via APIs and enterprise systems, but Google's broader media generation capabilities give it an edge for creative workflows.

Key Points
  • GPT-5.2 leads on SWE-bench (80% vs 76.2%) and AIME 2025 without tools (100% vs 95%); Gemini 3 leads on HLE (37.5% vs 34.5%) and MMMLU (91.8% vs 89.6%).
  • LMArena: GPT-5.2-high ranks #2 in web development; Gemini 3 Pro is #4, but Gemini dominates overall leaderboard categories.
  • OpenAI lost 6% of visitors in two weeks since Gemini 3 launch, per former Googler Deedy Das; Sam Altman declared 'code red' in a leaked memo.

Why It Matters

Professionals must pick models carefully: GPT-5.2 excels at coding and math, Gemini 3 at broad reasoning and media.