[P] Best approach for online crowd density prediction from noisy video counts? (no training data)
A developer's challenge: predicting crowd density 10 frames ahead with only noisy head counts and no historical data.
A developer on Reddit has presented a challenging real-time AI problem: predicting crowd density from noisy video counts without any historical training data. They are using per-frame head counts generated by the P2PNet model on video clips, which are stable but have a ±10% noise margin. The core task is to forecast density 5 to 10 frames ahead for specific zones and estimate the time until a critical crowd threshold is reached. Their current solution employs an Exponentially Weighted Moving Average (EMA) smoothed with a Gaussian-weighted linear extrapolation, but results are mixed. The Mean Absolute Error (MAE) is around 20 on a 55-frame sequence, and the model's accuracy in predicting the direction of count changes is a mere 49%, essentially performing no better than a coin flip.
The constraints are strict: the system must run online and in real-time on a CPU, with no access to past data for model training. This rules out data-hungry deep learning approaches, pushing the solution toward classical signal processing and time-series forecasting techniques. The user is actively soliciting advice from the community, specifically asking whether a Kalman filter—ideal for estimating the state of a dynamic system from noisy observations—or double exponential smoothing (Holt's method) for trend-corrected forecasting would be more suitable. The discussion highlights a critical gap between academic crowd-counting models and practical deployment, where clean data and GPU inference are often unavailable.
- Problem uses P2PNet-generated head counts with ±10% noise and no historical training data.
- Current EMA-smoothed linear extrapolation has 20 MAE and only 49% direction prediction accuracy.
- Solution must run online, in real-time, on CPU-only hardware, favoring classical forecasting techniques.
Why It Matters
This challenge bridges AI research and real-world safety applications, where reliable, low-latency crowd forecasting is critical.