ZAYA1 8B required RSA technique for output; local 8-bit quant failed (reasoning loop) despite <12GB memory usage?

ZAYA1 8B required RSA technique for output; local 8-bit quant failed (reasoning loop) despite <12GB memory usage.

Qwen3.6 35B-A3B MLX oQ4 (4-bit) produced near-perfect SVG with 2 extra pawns and confusing cursor triangles?

Qwen3.6 35B-A3B MLX oQ4 (4-bit) produced near-perfect SVG with 2 extra pawns and confusing cursor triangles.

GRM 2.6 Plus (OrionLLM) 4-bit quant (Q4K_M) performed correctly and visually well; 5-bit equivalent failed unexpectedly?

GRM 2.6 Plus (OrionLLM) 4-bit quant (Q4K_M) performed correctly and visually well; 5-bit equivalent failed unexpectedly.

Open Source

AI chessboard test: Qwen3.6 and ZAYA1 models battle SVG generation accuracy

Q: Qwen3.6 35B-A3B MLX oQ4 (4-bit) produced near-perfect SVG with 2 extra pawns and confusing cursor triangles?

Qwen3.6 35B-A3B MLX oQ4 (4-bit) produced near-perfect SVG with 2 extra pawns and confusing cursor triangles.

Q: GRM 2.6 Plus (OrionLLM) 4-bit quant (Q4K_M) performed correctly and visually well; 5-bit equivalent failed unexpectedly?

GRM 2.6 Plus (OrionLLM) 4-bit quant (Q4K_M) performed correctly and visually well; 5-bit equivalent failed unexpectedly.

r/LocalLLaMA May 12, 2026

⚡ZAYA1 8B nails the SVG but needs RSA technique; Qwen3.6 35B-A3B adds extra pawns.

Deep Dive

A Reddit user conducted a quality test comparing several local and cloud-based AI models on their ability to generate a chessboard SVG. The test involved Qwen3.6 variants (27B and 35B-A3B) in different MLX quantizations, the open-weight ZAYA1 8B, and derivatives like GRM 2.6 Plus from OrionLLM. Results show that higher bit counts don't guarantee better performance: Qwen3.6 27B MLX oQ6 (6-bit) delivered good, correct output but lacked row/column labels, while its 3.5-bit oQ3.5e version was poor. Surprisingly, Qwen3.6-27B-neo-code-di-imatrix-max at 4-bit (iq4_nl) performed well, but its 5-bit (q5k_s) variant was totally wrong.

ZAYA1 8B demonstrated perfect SVG generation when accessed via the Zaya cloud playground (presumably FP16), but local inference using MLX-LM failed due to the model's reliance on the RSA technique—the 8-bit quant entered a reasoning loop without producing output. This suggests that local inference engines need better support for RSA-based models. The test also explored derivative models like GRM 2.6 Plus and Qwopus 35B-A3B-v1, with mixed results; GRM 2.6 Plus Q4K_M (4-bit) was correct and visually good, but its 3-bit version degraded. Overall, the smallest viable quant was a 27B model at 3-bit (Q3K_M), hinting at potential for lightweight deployment on consumer hardware (<12GB RAM).

Key Points

ZAYA1 8B required RSA technique for output; local 8-bit quant failed (reasoning loop) despite <12GB memory usage.
Qwen3.6 35B-A3B MLX oQ4 (4-bit) produced near-perfect SVG with 2 extra pawns and confusing cursor triangles.
GRM 2.6 Plus (OrionLLM) 4-bit quant (Q4K_M) performed correctly and visually well; 5-bit equivalent failed unexpectedly.

Why It Matters

Reveals that quantized models can match larger ones, but specific techniques (RSA) and quant levels critically affect output quality.

Read Original Article

AI chessboard test: Qwen3.6 and ZAYA1 models battle SVG generation accuracy

Why It Matters

Related Articles

🚀 Stay Ahead in AI