Research & Papers

The World Leaks the Future: Harness Evolution for Future Prediction Agents

New 'internal feedback' system allows AI agents to learn from their own evolving predictions before outcomes are known.

Deep Dive

A research team from Zhongguancun Academy, University of Science and Technology of China, and Tsinghua University has introduced Milkyway, a novel framework for building AI agents that make predictions about unresolved future events. The core innovation addresses a fundamental limitation: while real-world decisions must be made with incomplete information, traditional AI training relies on final outcomes, which are too coarse and arrive too late to guide the nuanced process of evidence gathering and interpretation. Milkyway solves this by enabling agents to learn from their own evolving predictions over time.

Instead of retraining the base large language model (LLM), Milkyway maintains and updates a separate, persistent component called a 'future prediction harness.' This harness manages the prediction process—tracking key factors, gathering and interpreting public evidence, and handling uncertainty. As an agent makes repeated predictions on the same open question (e.g., 'Who will win the election?'), the system compares its newer predictions against older ones. The discrepancies, termed 'internal feedback,' reveal flaws in the earlier reasoning process, which are then used to refine the harness in real-time, before the final answer is known.

The method represents a shift from outcome-supervised learning to process-supervised refinement. After a question is finally resolved, the outcome serves as a 'retrospective check' to validate the harness updates before they are carried forward to new prediction tasks. In benchmarks, this approach yielded significant gains, improving the score on the FutureX dataset from 44.07 to 60.90 (a 38% relative improvement) and on FutureWorld from 62.22 to 77.96. The work suggests a path toward more adaptive, self-improving AI systems capable of reasoning under uncertainty with dynamically improving strategies.

Key Points
  • Milkyway uses 'internal feedback'—contrasting an agent's own predictions over time—to learn and improve before a final outcome is known.
  • The system boosts prediction scores by 38% on the FutureX benchmark (44.07 to 60.90) without retraining the core LLM.
  • It introduces a persistent 'future prediction harness' that manages factor tracking, evidence gathering, and uncertainty handling, which evolves across tasks.

Why It Matters

Enables AI systems to make better real-time forecasts for finance, policy, and logistics by learning from their own evolving reasoning process.