FOCAL: Filtered On-device Continuous Activity Logging for Efficient Personal Desktop Summarization
A multi-agent system that summarizes your desktop activity with 60% fewer tokens.
Researchers from Hong Kong Polytechnic University and collaborators have unveiled FOCAL (Filtered On-device Continuous Activity Logging), a privacy-first multi-agent system designed to summarize desktop activity streams directly on-device. The system tackles two core challenges: the high computational cost of processing every screenshot with a Vision-Language Model (VLM), and the problem of cross-task context pollution when handling interleaved user tasks. FOCAL employs a unified filter-plan-log architecture that cascades four specialized agents: a lightweight Filter Agent for noise suppression, a text-only Brain Agent for task attribution, a Record Agent for selective visual reasoning, and a task-isolated Memory Agent for context-coherent summarization.
Tested on the DesktopBench dataset—comprising 2,572 screenshots across 420 complex sessions—FOCAL achieved impressive efficiency gains. It reduced total token consumption by 60.4% and VLM call count by 72.3% compared to a baseline, while boosting Key Information Recall (KIR) from 0.38 to 0.61. Crucially, under A→B→A task interruptions, FOCAL maintained a Task Accuracy of 0.81 and KIR of 0.80, whereas the baseline collapsed to Task Accuracy of 0.03. This work pioneers efficient, on-device summarization of instruction-free desktop streams into multi-perspective personal logs, marking a significant step for privacy-preserving productivity tools.
- FOCAL reduces VLM call count by 72.3% and token consumption by 60.4% vs baseline on DesktopBench (2,572 screenshots, 420 sessions).
- Key Information Recall (KIR) jumps from 0.38 to 0.61, and under task interruptions, Task Accuracy stays at 0.81 vs baseline's 0.03.
- The system uses four specialized agents (Filter, Brain, Record, Memory) in a filter-plan-log architecture for on-device, privacy-first operation.
Why It Matters
FOCAL enables efficient, on-device desktop activity logging, boosting productivity while preserving privacy and handling complex task interruptions.