Research & Papers

Researchers unveil Cogniscope: a benchmark for early-risk cognitive AI systems

200,000 simulated behavioral records over 200 days test AI safety under drift.

Deep Dive

A team of researchers (Mahfuza Farooque, Ananya Drishti, and others) has released Cogniscope, a synthetic longitudinal benchmark and browser-based evaluation framework designed to assess early-risk cognitive AI systems. The framework targets a critical gap: evaluating AI models that must detect early signs of cognitive decline or risk from behavioral data over time, under realistic challenges like behavioral drift, sparse observations, and delayed evidence. Cogniscope has two core components: a synthetic simulation engine that generates privacy-preserving longitudinal behavioral traces aligned with configurable latent risk trajectories, and a browser-based data-collection instrument implemented as a Chrome extension that captures naturalistic video interaction telemetry and micro-question responses during YouTube playback.

The released benchmark includes 200,000 simulated video-interaction records from 200 users over 200 days, a 504-session schema-aligned synthetic deployment dataset across nine behavioral profiles, an 18-table relational schema, baseline evaluation scripts, and time-aware metrics including Early Risk Detection Error (ERDE) and time-to-detection (TTD). Initial experiments show that simple behavioral coherence signals can separate simulated risk states under controlled priors, but rule-based deployment-profile classification remains challenging—motivating the need for learned temporal models and robust evaluation protocols. Cogniscope is explicitly not a diagnostic system and does not claim clinical validity; its purpose is to provide a reusable testbed for researchers to evaluate how sequential models behave under known longitudinal challenges before deployment with real human-subject data.

Key Points
  • Includes 200,000 simulated records from 200 users tracked over 200 days, with 9 behavioral profiles.
  • Framework combines a synthetic simulation engine with a Chrome extension for real-time video interaction telemetry.
  • Uses time-aware metrics (ERDE, TTD) to benchmark longitudinal early-risk detection models.

Why It Matters

Provides a standardized, privacy-preserving testbed for evaluating AI safety in longitudinal behavioral monitoring before real-world deployment.