Open Source

Semantic video search using local Qwen3-VL embedding, no API, no transcription

An 8B parameter model embeds raw video directly into vectors, enabling natural language search without APIs or transcriptions.

Deep Dive

A developer has created SentrySearch, a command-line tool that leverages Alibaba's Qwen3-VL-Embedding models to perform semantic search directly on raw video files. The core innovation is bypassing traditional methods like transcription or frame captioning; instead, the model embeds video data natively into a vector space. This allows users to search footage using natural language queries, with the system finding matching clips based on semantic understanding rather than text matching. The tool was originally built on Google's Gemini embedding API but now includes a local backend powered by Qwen models after user demand for offline functionality.

The technical implementation is notable for its efficiency. The 8-billion-parameter Qwen3-VL-Embedding model produces "genuinely usable" results while running fully local, requiring approximately 18GB of RAM. A smaller 2-billion-parameter variant can operate on just 6GB, making it accessible on more consumer hardware. SentrySearch works by indexing video footage into a ChromaDB vector database. When a user submits a text query, the system finds semantically similar video embeddings and automatically trims the relevant clip. This approach offers a compelling alternative to cloud-based video search APIs, providing data privacy, no usage costs, and the ability to process sensitive or proprietary footage offline.

Key Points
  • Uses Qwen3-VL-Embedding models to embed raw video into vectors, eliminating need for transcription or captioning.
  • The 8B model runs locally on ~18GB RAM; a 2B version runs on ~6GB, tested on Apple Silicon and CUDA.
  • SentrySearch CLI tool indexes video into ChromaDB, searches with natural language, and auto-trims matching clips.

Why It Matters

Enables private, cost-effective semantic video search for sensitive content, removing dependency on cloud APIs and transcription services.