Screencasts offer scalable data for single-user AI emulation (Guardian Angels)
8 hours of screencasts at 1GB/hr could train a model to predict your clicks.
In response to Gwern's 'Guardian Angels' concept, sophia_xu tackles a key challenge: how to personalize an AI model that emulates a single user when you lack a vast corpus of their writings. Their solution leverages screencasts – continuous recordings of on-screen activity – as rich behavioral data. Motivating examples show the need: predicting which tweets a user clicks depends on subtle, implicit taste; guessing which sections of a paper they read requires knowledge of their domain expertise. Traditional supervised labeling is impractical at scale, but screencasts passively capture decision points.
The technical plan involves a macOS app that records 4fps screenshots, a 4fps webcam feed (for potential eye tracking), and timestamped keyboard/mouse inputs, all stored on Cloudflare's S3 equivalent. Current tests show ~1GB/hr; a full year would be under 10TB at less than $150. The author intends to create a benchmark using heuristics (e.g., did the model predict the clicked tweet?) and then hill-climb various methods to improve emulation accuracy. References include Stanford HCI's user modeling work and the paper 'Creating General User Models from Computer Use,' which shares similar goals but lacks continual learning and scalable evals. Initial results from 8 hours of recording are promising, and updates are expected.
- Screencasts capture user behavior at 4fps with eye tracking and input alignment, enabling passive data collection for personalization.
- Storage cost is under $150/year for up to 10TB on Cloudflare, making large-scale longitudinal recording feasible.
- Proposed benchmark uses heuristics (e.g., predicting which tweet a user clicks) to evaluate Guardian Angel model accuracy without manual labeling.
Why It Matters
Enables personalized AI emulation with minimal data labeling, ideal for high-fidelity user modeling and scalable evaluation.