Indaleko: The Unified Personal Index
PhD research prototype processes 'photos near the conference last spring' queries in under a second.
A new research prototype called Indaleko, developed as part of a PhD dissertation at the University of British Columbia, proposes a fundamental shift in how we search our personal data. The system, created by William Anthony Mason, introduces the Unified Personal Index (UPI) architecture, which is designed to align with how human memory actually works—recalling information through episodic cues like time, location, and context, rather than isolated keywords. The Indaleko prototype demonstrates this by successfully indexing a massive 31-million file dataset spanning 160TB across eight different storage platforms (like Google Drive and OneDrive) into a single, searchable graph database.
Technically, Indaleko integrates temporal, spatial, and activity metadata to enable natural language queries such as 'photos near the conference venue last spring,' which existing commercial systems cannot process. Its 'memory anchor indexing' achieves sub-second query responses and maintains perfect precision for well-defined memory patterns. The evaluation shows that while platforms like Windows Search return overwhelming, unfiltered results for such queries, Indaleko successfully processes multi-dimensional combinations. The architecture is also extensible, allowing new data sources to be integrated in 10 minutes to 10 hours, and preserves privacy through UUID-based semantic decoupling. This work bridges cognitive theory with distributed systems design, laying a foundation for future context-aware personal AI assistants.
- Prototype indexes 31 million files across 160TB and 8 storage platforms into a unified graph.
- Enables natural 'memory' queries (time/location/context) with sub-second responses, where Google Drive & OneDrive fail.
- Extensible architecture integrates new data sources in 10 min-10 hours and uses UUIDs for privacy.
Why It Matters
It could power future AI assistants that truly understand and retrieve your personal data the way you remember it.