Open Source

Ternary Bonsai: Top intelligence at 1.58 bits

New 1.58-bit AI models achieve a 9x memory reduction while outperforming peers on standard benchmarks.

Deep Dive

PrismML has introduced Ternary Bonsai, a groundbreaking family of language models that pushes the frontier of AI efficiency. These models, available in 8B, 4B, and 1.7B parameter sizes, utilize a novel 1.58-bit quantization scheme with ternary weights {-1, 0, +1}. This approach results in a memory footprint approximately 9x smaller than conventional 16-bit models, a critical advancement for deploying capable AI on edge devices, smartphones, and other hardware with strict memory constraints.

Building on the company's earlier 1-bit Bonsai models, Ternary Bonsai represents a strategic pivot on the efficiency curve, trading a modest increase in model size for a significant gain in performance. Initial benchmarks indicate these compact models outperform most peers in their respective parameter classes, challenging the assumption that high compression necessitates a major sacrifice in accuracy. The models are released in FP16 safetensors on Hugging Face for compatibility with standard tooling, with a more efficient packed MLX 2-bit format also available, signaling a move towards specialized, hardware-optimized deployment.

The release targets a clear market need for powerful yet efficient models that can run locally without constant cloud connectivity. While the current sizes are impressive, the AI community is already anticipating larger 20-40B parameter versions, which could fundamentally alter the landscape for 'large' models by making them drastically more portable and cost-effective to run.

Key Points
  • Uses 1.58-bit ternary weights {-1, 0, +1} for a memory footprint ~9x smaller than 16-bit models.
  • Available in three sizes (8B, 4B, 1.7B parameters) and outperforms most peers in its class on benchmarks.
  • Released in FP16 safetensors on Hugging Face with a packed MLX 2-bit format for efficient deployment.

Why It Matters

Enables powerful AI to run on smartphones and edge devices, reducing reliance on cloud infrastructure and associated costs.