Research & Papers

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

PHE breaks ANN search—can metadata filtering and BLOBs replace vector databases?

Deep Dive

A developer digging into vector databases, ANN (e.g., HNSW, IVF), and Partially Homomorphic Encryption (PHE) has hit a fundamental conflict: PHE-encrypted embeddings cannot be indexed efficiently by ANN algorithms, requiring exact linear scans that destroy the speed advantage of vector databases. To work around this, they propose ditching the vector DB entirely, storing embeddings as BLOBs in a traditional relational database, and using metadata tags (like RFID or labels) to filter down candidate sets before computing similarity on the smaller subset.

The big open questions are whether this metadata-first filtering pipeline can scale to 1 million+ embeddings without becoming a performance bottleneck, and whether it's just reinventing a worse version of a vector DB. The developer also asks about hybrid approaches—secure enclaves, partial decryption, or tiered search—and whether any production systems already do privacy-preserving vector search at scale. The post underscores a real tension: balancing strong encryption with low-latency retrieval in high-dimensional search.

Key Points
  • PHE forces exact linear scan over 1M+ embeddings, killing ANN speed gains from HNSW or IVF
  • Proposed workaround: store embeddings as BLOBs in a standard DB and use metadata tags to narrow candidate sets before similarity search
  • Questions remain about scaling to 1M+ embeddings and existence of production-ready hybrid approaches like secure enclaves

Why It Matters

Balancing encryption and speed in vector search is critical for privacy-first AI apps at scale.