Research & Papers

[P] Using residual ML correction on top of a deterministic physics simulator for F1 strategy prediction

A CSE student built a hybrid physics-ML system that runs 10,000 Monte Carlo simulations per race.

Deep Dive

A computer science student has open-sourced F1Predict, a sophisticated hybrid simulation system designed to predict Formula 1 race strategies and outcomes. The architecture cleverly layers machine learning on top of deterministic physics. At its core is a deterministic lap time engine that models tire degradation, fuel load, DRS, and traffic. On top of this, a LightGBM residual model, trained on historical telemetry from the FastF1 library, corrects pace deltas to create more accurate driver profiles before Monte Carlo execution.

The system then runs a massive 10,000-iteration Monte Carlo simulation for each race, producing P10/P50/P90 probability distributions for every driver. A key innovation is an auxiliary safety car hazard classifier that modulates the probability of a safety car event within the simulation per lap. The pipeline versions features like tire age, qualifying deltas, and track evolution. A separate 400-iteration strategy optimizer ensures reasonable web response times. The entire stack is built with Python, FastAPI, LightGBM, and uses Redis for caching results, keyed by a SHA-256 hash of the request.

This project represents a significant learning exercise in MLOps and systems architecture. The ML layer is designed to degrade gracefully, falling back to the pure physics simulator if a trained model artifact is unavailable. While the v1 residual model is still being trained on a broader dataset, the scaffolding for feature governance, versioning, and a clean API is fully in place. The live demo and GitHub repository invite technical feedback on the modeling and architectural choices.

Key Points
  • Hybrid architecture combines a deterministic physics simulator with a LightGBM ML model for residual correction.
  • Runs 10,000 Monte Carlo iterations per race to generate probabilistic driver performance distributions (P10/P50/P90).
  • Includes a safety car hazard classifier and a separate 400-iteration strategy optimizer for web performance.

Why It Matters

Demonstrates a practical, production-ready approach to combining physics-based simulation with ML for complex real-world forecasting.