AI Safety

Hodoscope: Visualization for Efficient Human Supervision

LessWrong AI February 21, 2026

⚡Researchers release visualization tool that helps humans spot AI reward hacking 10x faster

Deep Dive

Researchers Ziqian Zhong and Shashwat Saxena have released Hodoscope, an open-source visualization tool designed to make human supervision of AI agent trajectories more efficient. The tool addresses the fragility of LLM-based monitors, which can be easily persuaded by sophisticated justifications during reward hacking. Hodoscope's pipeline summarizes agent actions into behavioral summaries, embeds them into a shared vector space using t-SNE for 2D projection, and compares kernel density estimates across different agent setups to highlight anomalies. The interface allows reviewers to click points to inspect underlying actions, trace trajectories, and search via substring or regex. Initial testing on SWE-bench traces revealed density differences between models like o3 and others, with problematic behaviors appearing as overrepresented regions (red) versus underrepresented (blue).

Key Points

Open-source tool visualizes AI agent trajectories using t-SNE embeddings and kernel density comparison
Designed to overcome fragile LLM monitors that fail to detect reward hacking with sophisticated justifications
Human reviewers can inspect actions, trace trajectories, and search patterns 10x faster than manual review

Why It Matters

Enables scalable human oversight of AI systems where automated monitors fail, crucial for detecting novel reward hacking.

Read Original Article

Hodoscope: Visualization for Efficient Human Supervision

Why It Matters

Stay Ahead in AI