Research & Papers

Fox & Fox unveil two agentic AI frameworks to automate science workflows

Autonomous agents now curate time-series data and write reports from physics lectures.

Deep Dive

In a paper submitted to arXiv on May 25, 2026, researchers Judy Fox and Geoffrey Fox detail two novel agentic AI frameworks designed to automate scientific workflows. The first agent, DeepTS / DeepCollector, autonomously curates, extracts, and deduplicates large-scale time-series datasets—a notoriously tedious task in fields like climate science and finance. The second agent, DeepScribe, acts as an autonomous presentation analyzer: it ingests visually dense, mathematically complex physics lectures and converts them into structured scientific reports. Both systems employ a hybrid Local Body, Remote Brain architecture running on Google Colab, where Python-based local orchestrators invoke large language model (LLM) cloud backends. Key engineering innovations include granular attribute extraction via Cellular RAG, remote data inspection, and distributed concurrency controls, which together overcome the context-window and reasoning limitations of current state-of-the-art models.

Beyond the two agents, the authors outline a generalization of DeepTS to support deep knowledge graphs and discuss an application to high-energy physics called DeepQCD. By combining local computation with cloud-based LLM reasoning, the framework demonstrates a practical path toward truly autonomous scientific AI—one that can handle the rigor and scale required for real research. The paper was published on arXiv under arXiv:2605.26305 and is available under a permissive license. This work is particularly relevant for researchers in AI, systems engineering, and high-energy physics who are seeking to reduce manual data processing and report writing time.

Key Points
  • DeepTS/DeepCollector automates large-scale curation, extraction, and deduplication of time-series datasets from multiple sources.
  • DeepScribe converts visually dense physics lecture slides into structured scientific reports using granular attribute extraction (Cellular RAG).
  • Both agents use a hybrid Local Body, Remote Brain architecture via Google Colab, with Python-based orchestrators calling cloud LLM backends.

Why It Matters

Automating data curation and report writing accelerates scientific discovery by freeing researchers from manual, error-prone tasks.