Open Source

We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local

A fully local 27B model beats cloud-based deep research agents with zero telemetry.

Deep Dive

The LDR maintainer announced that their Qwen3.6-27B model, running locally on a single RTX 3090 (24GB) via Ollama, achieved 95.7% accuracy on the SimpleQA benchmark (287/300 correct) when paired with a LangChain‑based agentic search strategy. The approach, built with langgraph_agent, decomposes queries into parallel subtopics, performs up to 50 iterative tool calls, and uses the same model for self‑grading (validated against Opus). This beats the reported scores of Perplexity Deep Research (93.9%) and Tavily (93.3%) while running entirely offline.

Beyond benchmarks, LDR ships unique features for professional local research: a Journal Quality System that grades academic sources using OpenAlex and DOAJ, per‑user SQLCipher AES‑256 databases with PBKDF2‑HMAC‑SHA512 key derivation (admins cannot read data at rest), zero telemetry, and cosign‑signed Docker images with SLSA provenance. The project is fully MIT‑licensed and open source. Caveats include potential SimpleQA contamination on newer base models and noise from LLM grading, but daily use confirms consistent performance.

Key Points
  • Qwen3.6-27B on a single RTX 3090 scores 95.7% on SimpleQA using agentic search with langgraph_agent (up to 50 iterations).
  • Outperforms cloud services like Perplexity Deep Research (93.9%) and Tavily (93.3%) in similar benchmarks.
  • Includes academic journal grading, AES‑256 encrypted storage, zero telemetry, and MIT license — all fully local.

Why It Matters

Proves top‑tier AI research performance is achievable completely offline with privacy and full user control.