196B total params, only 11B active, multimodal (ViT 1.8B), runs on 128GB RAM.

Beats DeepSeek V4 Flash (55.6%) on SWE-Bench Pro (56.26%) and matches Gemini 3.5 Flash?

Beats DeepSeek V4 Flash (55.6%) on SWE-Bench Pro (56.26%) and matches Gemini 3.5 Flash.

DeepSearchQA F1 92.82% near GPT-5.5's 93.98%; HLE w/ tools 47.2% — excellent for agentic tasks?

DeepSearchQA F1 92.82% near GPT-5.5's 93.98%; HLE w/ tools 47.2% — excellent for agentic tasks.

Open Source

StepFun's Step 3.7 Flash: 196B MoE runs locally on 128GB RAM

r/LocalLLaMA May 29, 2026

⚡11B active params beat DeepSeek and match Gemini Flash on coding benchmarks.

Deep Dive

StepFun dropped Step 3.7 Flash, a massive 196B-parameter Mixture-of-Experts model with only 11B active parameters, plus a built-in 1.8B ViT for vision tasks. The key innovation: running locally on a single machine with 128GB RAM, making it one of the most capable self-hostable models for agent workflows. Benchmarks show it significantly outperforms its active parameter count: 56.26% on SWE-Bench Pro (besting DeepSeek V4 Flash's 55.6% and tying Gemini 3.5 Flash's 55.1%), 92.82% F1 on DeepSearchQA (competitive with GPT-5.5’s 93.98%), and 47.2% on HLE with tools. This suggests strong reasoning and code generation capabilities despite the sparse activation.

For professionals, this means high-end coding and agentic AI can now run locally, avoiding API costs and latency. The MoE architecture keeps inference efficient while maintaining quality from the full 196B parameter pool. StepFun offers access via OpenRouter and NVIDIA NIM for those who don't want to self-host, but the local option is the real headline. If you have 128GB RAM, this is a compelling alternative to cloud-based flash models, especially for sensitive or high-volume agentic workflows.

Key Points

Step 3.7 Flash: 196B total params, only 11B active, multimodal (ViT 1.8B), runs on 128GB RAM.
Beats DeepSeek V4 Flash (55.6%) on SWE-Bench Pro (56.26%) and matches Gemini 3.5 Flash.
DeepSearchQA F1 92.82% near GPT-5.5's 93.98%; HLE w/ tools 47.2% — excellent for agentic tasks.

Why It Matters

A top-tier flash model runs locally, unseating cloud competitors for coding and agent workflows.

Read Original Article

StepFun's Step 3.7 Flash: 196B MoE runs locally on 128GB RAM

Why It Matters

Related Articles

🚀 Stay Ahead in AI