Open Source

OpenBMB's BitCPM-CANN 1.58-bit model runs on Huawei Ascend 910B

A 1.58-bit quantized model achieves 4x memory savings on domestic AI chips.

Deep Dive

New models are being tested on the Huawei Ascend 910B accelerator, according to a post.

Key Points
  • BitCPM-CANN uses 1.58-bit ternary quantization (values -1, 0, +1) for 4x memory reduction over FP16.
  • Tested on Huawei Ascend 910B, achieving inference speeds comparable to traditional FP16 models.
  • Retains over 96% accuracy on NLP benchmarks despite aggressive compression.

Why It Matters

Extreme quantization enables LLMs on affordable chips, cutting hardware costs and reducing GPU dependency.