Open Source

Cactus Hybrid Router lets Gemma4-2B match Gemini by routing 15-55% tasks locally

A 65k-parameter router slashes cloud costs while matching frontier model performance

Deep Dive

Cactus Compute built "Cactus Hybrid Router," a 65k-parameter model that decodes on the fly whether to complete a task with an edge model or route to frontier cloud. It features an adjustable edge-cloud ratio for optimized resource allocation, robust performance even when the edge model is quantized (using Cactus Quants' 4-bit uniform that nears fp16 naturally), and handles text-only, vision, and audio prompts. The router is open-source on GitHub.

Key Points
  • 65k-parameter router model decides in real-time whether to run tasks locally (Gemma4-2B) or route to cloud (Gemini 3.1 Flash Lite)
  • Routes 15-55% of tasks to cloud while matching Gemini's performance, cutting inference cost significantly
  • Supports text, vision, and audio prompts with adjustable edge-cloud ratio and 4-bit quantization near FP16 accuracy

Why It Matters

Enables cost-effective AI inference by offloading simple queries to local models, reducing cloud dependency for live AI and coding agents