Screencasts capture user behavior at 4fps with eye tracking and input alignment, enabling passive data collection for personalization?

Screencasts capture user behavior at 4fps with eye tracking and input alignment, enabling passive data collection for personalization.

Storage cost is under $150/year for up to 10TB on Cloudflare, making large-scale longitudinal recording feasible?

Storage cost is under $150/year for up to 10TB on Cloudflare, making large-scale longitudinal recording feasible.

Proposed benchmark uses heuristics (e.g., predicting which tweet a user clicks) to evaluate Guardian Angel model accuracy without manual labeling?

Proposed benchmark uses heuristics (e.g., predicting which tweet a user clicks) to evaluate Guardian Angel model accuracy without manual labeling.

AI Safety

Screencasts offer scalable data for single-user AI emulation (Guardian Angels)

LessWrong AI June 27, 2026

⚡8 hours of screencasts at 1GB/hr could train a model to predict your clicks.

Deep Dive

In response to Gwern's 'Guardian Angels' concept, sophia_xu tackles a key challenge: how to personalize an AI model that emulates a single user when you lack a vast corpus of their writings. Their solution leverages screencasts – continuous recordings of on-screen activity – as rich behavioral data. Motivating examples show the need: predicting which tweets a user clicks depends on subtle, implicit taste; guessing which sections of a paper they read requires knowledge of their domain expertise. Traditional supervised labeling is impractical at scale, but screencasts passively capture decision points.

The technical plan involves a macOS app that records 4fps screenshots, a 4fps webcam feed (for potential eye tracking), and timestamped keyboard/mouse inputs, all stored on Cloudflare's S3 equivalent. Current tests show ~1GB/hr; a full year would be under 10TB at less than $150. The author intends to create a benchmark using heuristics (e.g., did the model predict the clicked tweet?) and then hill-climb various methods to improve emulation accuracy. References include Stanford HCI's user modeling work and the paper 'Creating General User Models from Computer Use,' which shares similar goals but lacks continual learning and scalable evals. Initial results from 8 hours of recording are promising, and updates are expected.

Key Points

Screencasts capture user behavior at 4fps with eye tracking and input alignment, enabling passive data collection for personalization.
Storage cost is under $150/year for up to 10TB on Cloudflare, making large-scale longitudinal recording feasible.
Proposed benchmark uses heuristics (e.g., predicting which tweet a user clicks) to evaluate Guardian Angel model accuracy without manual labeling.

Why It Matters

Enables personalized AI emulation with minimal data labeling, ideal for high-fidelity user modeling and scalable evaluation.

Read Original Article

Screencasts offer scalable data for single-user AI emulation (Guardian Angels)

Why It Matters

Related Articles

🚀 Stay Ahead in AI