Intelligence Index v4.1 includes 9 new evaluations covering financial reasoning, command-line tasks, and knowledge work, assessing 525 models across proprietary and open-weights categories?

Intelligence Index v4.1 includes 9 new evaluations covering financial reasoning, command-line tasks, and knowledge work, assessing 525 models across proprietary and open-weights categories.

Claude Sonnet 5 debuted with strong agentic capabilities, while GLM-5.2 became the top open-weights model on the index, and GPT-5.5 Instant offers faster inference for high-throughput tasks?

Claude Sonnet 5 debuted with strong agentic capabilities, while GLM-5.2 became the top open-weights model on the index, and GPT-5.5 Instant offers faster inference for high-throughput tasks.

New benchmarks like AA-Briefcase and AA-AgentPerf provide granular insights into long-horizon knowledge work and hardware-level agent performance, enabling tailored model selection?

New benchmarks like AA-Briefcase and AA-AgentPerf provide granular insights into long-horizon knowledge work and hardware-level agent performance, enabling tailored model selection.

Models & Releases

Artificial Analysis launches Intelligence Index v4.1 with 9 new benchmarks

Artificialanalysis July 02, 2026

⚡Claude Sonnet 5 and GLM-5.2 lead updated rankings across agentic and open-weight categories.

Deep Dive

Artificial Analysis, an independent AI benchmarking platform, released Intelligence Index v4.1, a comprehensive update that introduces nine new evaluations to help users select the best model and provider for their use case. The new benchmarks include GDPval-AA V2 (knowledge work), τ³-Banking (financial reasoning), Terminal-Bench v2.1 (command-line tasks), SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, and AA-LCR. The index now covers 525 models from providers like Anthropic, OpenAI, Google, and open-weight leaders, with a personalized recommender that prioritizes intelligence, speed, and cost.

Notable additions include Claude Sonnet 5, which shows strong agentic performance but higher cost per task, GLM-5.2 as the new leading open-weights model, and GPT-5.5 Instant (June 2026) for speed-optimized workloads. New proprietary benchmarks like AA-Briefcase (long-horizon knowledge work) and AA-AgentPerf (hardware performance for agents) allow users to compare agents for general work, coding, and customer support. The changelog also includes evaluations for DiffusionGemma 26B A4B, Nex-N2-Pro, Grok Build 0.1 0616, Kimi K2.7 Code, and a Speech-to-Speech Index, making this the most extensive update yet.

Key Points

Intelligence Index v4.1 includes 9 new evaluations covering financial reasoning, command-line tasks, and knowledge work, assessing 525 models across proprietary and open-weights categories.
Claude Sonnet 5 debuted with strong agentic capabilities, while GLM-5.2 became the top open-weights model on the index, and GPT-5.5 Instant offers faster inference for high-throughput tasks.
New benchmarks like AA-Briefcase and AA-AgentPerf provide granular insights into long-horizon knowledge work and hardware-level agent performance, enabling tailored model selection.

Why It Matters

Professionals can now make data-driven decisions when choosing AI models, balancing intelligence, speed, and cost with real-world agentic benchmarks.

Read Original Article

Artificial Analysis launches Intelligence Index v4.1 with 9 new benchmarks

Why It Matters

Related Articles

🚀 Stay Ahead in AI