Global Sequential Testing for Multi-Stream Auditing
New method improves detection speed for AI failures in multi-stream data, outperforming standard Bonferroni correction.
A team of researchers, including Beepul Bharti, Ambar Pal, and Jeremias Sulam, has released a new paper on arXiv titled 'Global Sequential Testing for Multi-Stream Auditing'. The work addresses a critical challenge in deploying machine learning systems in risk-sensitive domains: the need for continuous, real-time auditing to detect performance degradation or failures. The problem is framed as a sequential hypothesis test across k incoming data streams, where the goal is to quickly reject the global null hypothesis that the system is functioning normally. The standard approach uses a Bonferroni correction, which has limitations in speed, especially when many data streams are involved.
The researchers construct new sequential tests by merging test martingales, creating methods with different trade-offs in expected stopping times under various failure scenarios (sparse or dense). Their key contribution is a new 'balanced test' that achieves an improved theoretical bound on expected stopping time. While it matches the Bonferroni bound of O(ln(k/α)) in sparse settings, it significantly outperforms it under a dense alternative, achieving O((1/k)ln(1/α)). This means the test can detect widespread system failures much faster as the number of monitored streams (k) increases. The paper includes empirical validation on synthetic and real-world data, demonstrating the practical effectiveness of the proposed tests for faster, more reliable AI system monitoring.
- Proposes new sequential tests using merged test martingales for auditing k data streams from AI systems.
- Derives a 'balanced test' with an improved expected stopping time bound of O((1/k)ln(1/α)) under dense failures.
- Empirically validated on real-world data, enabling quicker detection of ML system degradation in finance or healthcare.
Why It Matters
Enables faster, more reliable real-time monitoring of critical AI systems, reducing risk in finance, healthcare, and autonomous operations.