Research & Papers

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

3,505 AI agents traded real ETH for 21 days—here’s what broke and how they fixed it.

Deep Dive

A new paper from T.J. Barton and six co-authors presents DX Terminal Pro, a 21-day deployment where 3,505 user-funded autonomous language-model agents traded real ETH in a bounded onchain market. The system handled 7.5M agent invocations, roughly 300K onchain actions, about $20M in volume, and over 5,000 ETH deployed. It consumed roughly 70B inference tokens and achieved 99.9% settlement success for policy-valid submitted transactions. Long-running agents accumulated thousands of sequential decisions, including 6,000+ prompt-state-action cycles for continuously active agents, yielding a large-scale trace from user mandate to rendered prompt, reasoning, validation, portfolio state, and settlement.

Crucially, reliability did not come from the base model alone—it emerged from the operating layer around the model: prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability. Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics. Targeted harness changes reduced fabricated sell rules from 57% to 3%, reduced fee-led observations from 32.5% to below 10%, and increased capital deployment from 42.9% to 78.0% in an affected test population. The authors argue that capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.

Key Points
  • 3,505 AI agents traded real ETH onchain for 21 days, handling $20M in volume and 300K actions.
  • Reliability came from an operating layer (prompt compilation, typed controls, policy validation), not the base model.
  • Pre-launch harness changes reduced fabricated trading rules from 57% to 3% and increased capital deployment from 42.9% to 78%.

Why It Matters

Proves AI agents can manage real capital reliably with layered controls, not just smarter models.