Nemotron-3-Super-120b Uncensored
A newly released, uncensored 120B parameter AI model achieves near-perfect 97% on the HarmBench safety benchmark.
DeAlign AI has released a significant new entry in the uncensored AI model space with 'Nemotron-3-Super-120B-A12B-4bit-MLX-CRACK-Uncensored.' This is a 120-billion parameter model based on NVIDIA's Nemotron-3 architecture, but with its built-in safety and refusal mechanisms removed. The creator acknowledges an initial flawed release, stating the model's unique hybrid architecture—combining LatentMixture-of-Experts (LatentMoE) and Mamba attention—required a 24-hour rebuild and special handling. This architecture prevents standard quantization workflows, requiring all processing to be done at the target quantization level, with Q6 and Q8 versions promised soon.
The model's performance metrics are striking, achieving a 97% score on HarmBench, a benchmark for evaluating harmful content generation, and 94% on HumanEval for coding tasks. It is specifically packaged for Apple's MLX framework, allowing it to run on Macs with Apple Silicon. However, because native MLX does not yet support LatentMoE, users must utilize a provided custom Python script or wait for MLX Studio to add native support. The release includes the necessary chat template and script, marking a rare instance where the creator applied custom modifications to bypass architectural constraints for local execution.
- The 120B parameter model scored 97% on HarmBench and 94% on HumanEval, indicating high capability with safety filters removed.
- Uses a hybrid LatentMoE and Mamba attention architecture, requiring custom scripts for Apple's MLX framework as native support is lacking.
- Released in a 4-bit quantized format for MLX, with Q6 and Q8 versions coming, providing a powerful uncensored option for local AI development.
Why It Matters
Provides developers with a highly capable, uncensored AI model that runs locally on Apple hardware, pushing the boundaries of accessible, unrestricted large language models.