User's RTX 5090 (24GB VRAM) can run models up to ~20GB with room for context, making qwen3.6?

35b (24GB) potentially too large without offloading.

Model names like 'A4B' and 'UD' indicate sparsity (4 active experts) and quantization method (e.g., Q4_K_M), affecting speed and quality beyond raw parameter count?

Model names like 'A4B' and 'UD' indicate sparsity (4 active experts) and quantization method (e.g., Q4_K_M), affecting speed and quality beyond raw parameter count.

Community suggests LM Studio, Open WebUI, or 'text-generation-webui' as featured Windows GUIs with more features than Ollama's default interface?

Community suggests LM Studio, Open WebUI, or 'text-generation-webui' as featured Windows GUIs with more features than Ollama's default interface.

Open Source

New LLM user overwhelmed by tools with RTX 5090 and 64GB RAM

r/LocalLLaMA June 10, 2026

⚡Gemma vs Qwen, 27B vs 35B, and 17GB vs 24GB – where to start?

Deep Dive

The r/LocalLLaMA subreddit saw a post from a user (u/cryptospartan) who is brand new to running LLMs locally, despite owning a powerhouse system: an AMD 9950X3D CPU, 64GB of DDR5 RAM, and an RTX 5090 GPU. They express frustration with the overwhelming number of tools and model variants available on GitHub and Ollama. Specifically, they can't decide on a Windows GUI (built-in Ollama is too barebones), they see model names like 'gemma4-26B-A4B-it-UD-Q4_K_M' that look like alphabet soup, and they don't understand the practical difference between 27B vs 35B parameter versions of Qwen3.6 (17GB vs 24GB sizes). They also question whether a larger model is always better if it fits in VRAM.

The post resonated because it mirrors a universal experience: the local LLM ecosystem has exploded in 2025 with dozens of quantizations, model families (Gemma, Qwen, Llama, Mistral), and inference backends (Ollama, LM Studio, text-generation-webui, etc.). Experienced users chimed in with advice: use LM Studio or Open WebUI for a polished Windows GUI; check the GPU's VRAM (24GB on RTX 5090) against model file size plus context overhead; and prefer larger models (35B over 27B) if they fit, but test inference speed. The post underscores a growing need for curated beginner guides and simpler naming conventions.

Key Points

User's RTX 5090 (24GB VRAM) can run models up to ~20GB with room for context, making qwen3.6:35b (24GB) potentially too large without offloading.
Model names like 'A4B' and 'UD' indicate sparsity (4 active experts) and quantization method (e.g., Q4_K_M), affecting speed and quality beyond raw parameter count.
Community suggests LM Studio, Open WebUI, or 'text-generation-webui' as featured Windows GUIs with more features than Ollama's default interface.

Why It Matters

Even high-end hardware doesn't make local LLM choices obvious—new users need clearer documentation and benchmarks.

Read Original Article

New LLM user overwhelmed by tools with RTX 5090 and 64GB RAM

Why It Matters

Related Articles

Stay Ahead in AI