Research & Papers

Surge AI's EnterpriseGym Corecraft trains agents 45% better on real enterprise tasks

arXiv cs.AI February 19, 2026

⚡A new high-fidelity simulation with 2,500+ entities trains AI agents that outperform frontier models on complex workflows.

Deep Dive

Researchers from Surge AI introduced EnterpriseGym Corecraft, a high-fidelity RL environment simulating a customer support organization with 2,500+ entities and 23 tools. They trained GLM 4.6 using GRPO, boosting its task pass rate from 25.37% to 36.76% in one epoch. The trained agent showed significant generalization, improving performance by 4.5-7.4% on three out-of-distribution benchmarks, demonstrating that realistic training environments produce more capable and adaptable AI agents.

Why It Matters

This approach could lead to AI agents that reliably perform complex, multi-step professional work, moving beyond simple chatbots to true digital assistants.

Read Original Article

Surge AI's EnterpriseGym Corecraft trains agents 45% better on real enterprise tasks

Why It Matters

Related Articles

🚀 Stay Ahead in AI