FPILOT: Inference-time optimization boosts RL trading agents without retraining
New framework adapts any pretrained RL agent using price forecasts to improve returns.
Researchers Eun Go, Rohan Deb, and Arindam Banerjee have introduced FPILOT (Financial Plugin Inference-time Learning for Optimal Trading), a novel framework that enhances reinforcement learning agents for portfolio management by optimizing them at inference time. Traditional RL trading agents are deployed as static policies with no ability to incorporate price forecasts during execution. FPILOT breaks this limitation by treating future prices as independent of a single agent's actions, allowing a separate predictive model to generate multi-step price trajectories without the need for iterative rollouts. At each decision step, the framework uses the predicted price path to construct an imagined return objective and optimizes the policy before executing one trade. Crucially, it works as a plugin with any pre-trained agent—no retraining required.
Evaluated across five policy learning algorithms on the TradeMaster DJ30 benchmark, FPILOT delivered consistent improvements in total return and three key risk-adjusted metrics: Sharpe ratio, Sortino ratio, and Calmar ratio. Stochastic policies showed larger gains than deterministic ones, suggesting the method excels in probabilistic settings. The researchers also tested FPILOT with synthetic forecasts at calibrated quality levels and found that performance improvements directly correlated with forecaster accuracy. This result implies that as financial forecasting models advance, FPILOT's effectiveness will scale accordingly, making it a future-proof augmentation for any RL trading pipeline.
- FPILOT acts as a plugin that adapts any pretrained RL trading agent at inference time using price forecasts, with no retraining.
- Tested on five policy algorithms on the TradeMaster DJ30 benchmark, improving total return and risk metrics (Sharpe, Sortino, Calmar).
- Gains scale with forecaster quality — better price predictions lead to higher trading performance improvements.
Why It Matters
Lets existing RL trading strategies dynamically incorporate market forecasts, boosting returns without costly retraining.