Research & Papers

AI Agents Enable Proactive Real-Time Analytics Without Manual Queries

No more reactive queries—AI agents autonomously discover insights from live data streams.

Deep Dive

Modern analytics tools force users to manually define queries, a process that breaks down in real-time streaming environments where the space of potential insights is too vast. Rossiello and Subramanian propose a multi-agent system that flips this paradigm. The architecture implements a continuous discovery loop: specialized agents generate hypotheses, compile them into executable analytics, validate the results, and create visualizations or deployable applications. The system uses Apache Kafka for event-driven coordination, Apache Flink for stream processing, and large language models (LLMs) to power the agents. A key innovation is a contract-driven design based on typed intermediate artifacts, which ensures modularity, observability, lineage, and safe execution of dynamically generated code.

Through case studies in retail (e.g., real-time inventory anomalies), finance (fraud pattern identification), and public data (traffic pattern shifts), the paper shows how the architecture enables a transition from reactive, query-driven analytics to proactive, discovery-driven systems. Instead of waiting for a user to ask "What happened?", the agents continuously uncover surprising patterns and changes. This work was accepted at the Supporting Our AI Overlords (SAO) workshop at the ACM Conference on AI and Agentic Systems (CAIS) on May 26, 2026. It represents a significant step toward making data systems autonomous and self-optimizing, reducing the cognitive load on analysts and enabling faster decision-making.

Key Points
  • Multi-agent architecture uses LLMs to generate hypotheses, compile analytics, and validate results in a continuous loop.
  • Leverages Apache Kafka for event-driven coordination and Apache Flink for real-time stream processing.
  • Contract-driven design with typed intermediate artifacts ensures safe, observable execution of dynamically generated code.

Why It Matters

Automates real-time data exploration, freeing analysts from manual queries and enabling instant detection of anomalies and trends.