Open Source

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models

Forensic benchmark shows uncensored Qwen models achieve 99.2% attack success rates while preserving most capabilities.

Deep Dive

An independent researcher has published a comprehensive forensic analysis comparing three prominent 'abliteration' techniques used to remove safety filters from open-source AI models. The study examined Heretic by p-e-w, HauhauCS Aggressive, and Huihui methods applied to five Qwen models (2B to 27B parameters), using full benchmark suites, safety evaluations, weight analysis, and KL divergence measurements. The findings reveal all three techniques successfully bypass safety mechanisms with 98-99.8% attack success rates on HarmBench's 400 harmful behaviors, while maintaining 95-99% of original model capabilities across eight standard benchmarks.

Notably, the HauhauCS method demonstrated the most balanced performance with 99.2% safety bypass rate and minimal capability degradation - actually improving GSM8K math scores from 57.09 to 57.39 in the 2B model. The analysis also revealed architectural differences matter: hybrid Mamba2+Transformer models (Qwen3.5) showed less collateral damage than pure Transformer architectures during abliteration. The researcher faced community backlash, including bans from HauhauCS Discord, but published full methodology and raw data on HuggingFace for independent verification.

The technical deep dive included SVD analysis, fingerprint matching, edit vector overlap, and per-layer weight examination using RTX 5090 and 4090 hardware. For the 27B model, researchers employed BitsAndBytes 4-bit quantization while preserving relative performance deltas. The study represents the most comprehensive public comparison of model 'uncensoring' techniques to date, providing quantitative evidence that current safety mechanisms can be systematically removed with minimal performance impact.

Key Points
  • All three abliteration methods achieved 98-99.8% attack success rates on HarmBench's 400 safety tests
  • HauhauCS showed best balance with 99.2% safety bypass and only 0.0201 KL divergence from original models
  • Hybrid Mamba2+Transformer architectures (Qwen3.5) proved more resilient to capability loss during modification than pure Transformers

Why It Matters

Reveals fundamental vulnerabilities in current AI safety approaches and quantifies the trade-offs between model safety and capability preservation.