Open Source

VecML runs RAG on 200K documents on Snapdragon X2 laptop

Indexes 200,000 files on-device using only 1200 tokens and 128-shard buffer

Deep Dive

VecML has showcased a compelling on-device AI RAG demo running on Qualcomm's new Snapdragon X2 laptop chipset. Installed on an ASUS Zenbook A16 3K OLED Touchscreen with the Snapdragon X2 Elite Extreme (2026), the system indexed approximately 200,000 local files, with about 100,000 completed in the demo run. The standout achievement is that retrieval required only 1,200 tokens, and memory usage was kept minimal by offloading most data to disk with just a 128-shard active buffer. This was made possible by VecML's in-house unified AI database platform, which integrates the core functionality of six typically separate database systems: vector, graph, relational, key-value, search, and document stores.

Behind the scenes, this unified architecture enables joint optimization across indexing, retrieval, graph traversal, storage, and memory management. The result is a fast, accurate RAG system that operates entirely locally. According to VecML, the NPU-based indexing speed reached roughly 50% of an RTX 5060 laptop, but in a much lighter and quieter form factor. The laptop itself was praised for being extremely light (carried single-handed across an airport) and having a very portable power adapter, though consumption still exceeds United Airlines' in-flight charging limit.

VecML also announced that their macOS AI-PC software is now open for controlled testing. The implications are significant: enterprise teams can run massive document retrieval on lightweight laptops with no cloud dependency, preserving privacy and enabling offline workflows. As AI-PCs powered by Qualcomm's NPU become more common, this kind of on-device RAG could become a standard feature for professionals who need to query large personal or corporate document collections securely and efficiently.

Key Points
  • VecML's AI-PC software on a Snapdragon X2 laptop indexed ~200K files using only 1,200 retrieval tokens and a 128-shard active buffer
  • The unified database architecture integrates vector, graph, relational, key-value, search, and document stores for optimized on-device RAG
  • Indexing speed reaches ~50% of an RTX 5060 laptop while running on a lightweight, quiet form factor with NPU acceleration

Why It Matters

Brings enterprise-scale RAG to lightweight laptops, enabling private, offline AI document analysis without cloud costs.