Open Source

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2-Q4_K_M-GGUF

r/LocalLLaMA March 22, 2026

⚡Community-built AI model runs 42 tokens/sec on an RTX 3060, combining reasoning, coding, and creative writing.

Deep Dive

A new open-source AI model named 'OmniClaw-Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2' has been released on HuggingFace by developer LuffyTheFox. This 9-billion parameter model is a sophisticated merge of four distinct models: Jackrong's distilled Claude 4.6 reasoning model, HauhauCS's aggressively uncensored Qwen 3.5 base, Tesslate's OmniCoder for programming, and nbeerbower's creative writing model. The merge was performed with precise weight balancing (0.5 each for coding/creativity, 1.0 for the base) and careful preservation of training data using Float32 precision during quantization. The final model is quantized to the efficient Q4_K_M format via llama.cpp, making it highly accessible.

Performance is a key highlight, with the creator reporting speeds of 42 tokens per second on a consumer-grade NVIDIA RTX 3060 GPU when run in LM Studio. The model is specifically designed for users who want a 'big context window' and high capability in a local, uncensored package, avoiding the content filters of corporate models like ChatGPT or Claude. Detailed instructions for optimal performance, including a specific system prompt and sampler settings (Temperature: 0.7, Top P: 0.8), are provided. This release represents a significant community effort, leveraging public datasets like the 'claude-opus-4.6-10000x' to create a powerful, multi-talented AI tool outside traditional development channels.

Key Points

Model is a 4-way merge combining Claude 4.6 reasoning, uncensored Qwen 3.5, OmniCoder for programming, and a creative writing model.
Achieves 42 tokens/second performance on a consumer RTX 3060 GPU, optimized via Q4_K_M quantization for local efficiency.
Explicitly built to be 'uncensored' and run locally, providing an alternative to restricted corporate AI models with a blend of reasoning, coding, and creative skills.

Why It Matters

Democratizes high-performance, multi-purpose AI for local use, giving developers and researchers an uncensored, cost-effective alternative to API-based models.

Read Original Article

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-v2-Q4_K_M-GGUF

Why It Matters

Stay Ahead in AI