Open Source

Apodex releases Smol 0.8B-4B models for agentic verification

Specialized tiny models beat gigs of generalists in long-horizon agent tasks

Deep Dive

Apodex AI has released open weights for their Smol model series (0.8B, 2B, and 4B parameters), designed specifically for agentic verification in long-horizon tasks. Rather than relying on monolithic 70B+ models for every step, these compact models act as specialized sub-agents within the AgentOS runtime, handling structured tasks like source cross-examination, hypothesis testing, and tool-grounded synthesis. The models are trained to treat external text as "claims" to be verified, execute precise tool calls, and validate outputs before returning results to the main controller. This approach dramatically improves efficiency and reliability in multi-step agent workflows.

Alongside the models, Apodex open-sourced AgentHarness, a framework for evaluating local agent loops without drift over 50+ steps. Their flagship model, Apodex-1.0-H, which uses these verification layers at scale, achieved strong benchmarks: 94.4 on DeepSearchQA, 90.3 on BrowseComp, 60.8 on HLE-Text, and 74.2 on SuperChem, though FrontierScience Research remains a challenge at 46.7. The team invites the community to experiment with local agent orchestration using small verification models, and hints at potential GGUF/EXL2 quantizations based on feedback.

Key Points
  • Open-weight Smol models (0.8B/2B/4B) specialized for agentic verification, not conversational fluency
  • AgentHarness evaluation framework released for testing long-horizon local agent loops without drift
  • Flagship model achieves 94.4 DeepSearchQA and 90.3 BrowseComp using verification layers

Why It Matters

Enables efficient local agent workflows using small, verification-focused models instead of massive 70B+ LLMs.