Open Source

Reddit rant declares only two local LLMs matter: Qwen 3.6 variants

A viral post says your GPU doesn't care, just cram in a garbage quant of 35B.

Deep Dive

A Reddit post titled 'Stop asking what model to run. There are literally only two.' has ignited fierce debate in the local LLM community. The author, u/Wrong_Mushroom_7350, argues that Hugging Face is effectively empty and that only two models exist: Qwen 3.6 35b a3b and Qwen 3.6 27b. The post mocks users who meticulously optimize small models with full precision, claiming a 'garbage quant of a massive model' performs far better than a pristine micro-model. The author suggests ignoring VRAM constraints and letting system RAM handle spillover.

The post's hyperbolic tone is clearly bait, but it sparked a huge reaction—both agreement and outrage. The author later admitted they expected downvotes but instead saw the post blow up. Underlying the humor is a real tension in open-source AI: whether to run large, heavily quantized models or smaller, higher-quality ones. The rant also takes a jab at contrarians who complain about open-source shortcomings, telling them to just pay for Claude Code instead. While not meant as factual, the post taps into genuine frustrations about model selection and hardware limitations.

Key Points
  • Post claims only Qwen 3.6 35b a3b and Qwen 3.6 27b are worth running locally; all other Hugging Face models dismissed.
  • Advocates for using heavily quantized (low-bit) large models over small full-precision models, even at the cost of performance and RAM usage.
  • Suggests that if local models don't meet needs, users should stop complaining and pay for proprietary tools like Claude Code.

Why It Matters

Reflects the growing frustration in the open-source community over model selection and the trade-offs between size, quantization, and practicality.