Zhipu AI's open-weight GLM 5.2 beats Claude Code in IDOR detection at 1/6th cost
Open-weight GLM 5.2 scores 39% F1 on IDORs, beating Claude Code's 32% for $0.17 per bug.
Zhipu AI (Z.ai) released GLM 5.2 on June 13, 2026, with open weights under an MIT license three days later. This Mixture-of-Experts model has 750 billion total parameters (40B active per token) and supports up to 1M tokens of usable context. In Semgrep's IDOR (Insecure Direct Object Reference) detection benchmark, GLM 5.2 scored a 39% F1, beating Claude Code (32%) and all other prompt-only models, including Claude Opus 4.8. Semgrep's own multimodal pipeline achieved 53-61% F1 but relies on a purpose-built harness that does endpoint discovery and context filtering. GLM 5.2 ran in a simple Pydantic AI harness with only a basic prompt, no guided navigation. On coding benchmarks, GLM 5.2 posts 81.0 on Terminal-Bench 2.1 (vs. Claude Opus 4.8's 85.0) and 62.1 on SWE-bench Pro, edging closed frontier models. Its inference cost is roughly one-sixth of comparable frontier models, making it a strong candidate for security teams needing open-weight, on-premise deployment.
The result underscores a key insight from Semgrep's experiment: the gap between open-weight and closed models in specialized tasks like vulnerability detection is narrowing, and much of the performance difference previously attributed to the model may actually come from the surrounding harness. GLM 5.2's open-weight nature allows security teams to run it entirely inside their own environments, fine-tune it, and inspect it — a critical advantage for sensitive applications. While the model's training data is not fully open (though Z.ai publishes its RL framework), its competitive coding benchmarks and cost efficiency position it as a viable alternative to expensive frontier APIs. The news arrives at a time when tokenomics is becoming as important as raw capability, and GLM 5.2 is being compared favorably to DeepSeek's open-weight releases.
- GLM 5.2 achieves 39% F1 on Semgrep's IDOR benchmark, beating Claude Code (32%) with a $0.17 per vulnerability cost.
- Mixture-of-Experts design: 750B total parameters, 40B active per token; supports 1M token context; MIT license.
- Coding benchmarks: Terminal-Bench 2.1 score of 81.0 (vs. Claude Opus 4.8's 85.0) and SWE-bench Pro 62.1, edging frontier models.
Why It Matters
Open-weight models now rival closed ones on security tasks, enabling cost-effective, on-premise AI agents for sensitive code analysis.