EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments
A new high-fidelity simulation with 2,500+ entities trains AI agents that outperform frontier models on complex workflows.
Researchers from Surge AI introduced EnterpriseGym Corecraft, a high-fidelity RL environment simulating a customer support organization with 2,500+ entities and 23 tools. They trained GLM 4.6 using GRPO, boosting its task pass rate from 25.37% to 36.76% in one epoch. The trained agent showed significant generalization, improving performance by 4.5-7.4% on three out-of-distribution benchmarks, demonstrating that realistic training environments produce more capable and adaptable AI agents.
Why It Matters
This approach could lead to AI agents that reliably perform complex, multi-step professional work, moving beyond simple chatbots to true digital assistants.