OpenBMB's BitCPM-CANN 1.58-bit model runs on Huawei Ascend 910B
A 1.58-bit quantized model achieves 4x memory savings on domestic AI chips.
Deep Dive
New models are being tested on the Huawei Ascend 910B accelerator, according to a post.
Key Points
- BitCPM-CANN uses 1.58-bit ternary quantization (values -1, 0, +1) for 4x memory reduction over FP16.
- Tested on Huawei Ascend 910B, achieving inference speeds comparable to traditional FP16 models.
- Retains over 96% accuracy on NLP benchmarks despite aggressive compression.
Why It Matters
Extreme quantization enables LLMs on affordable chips, cutting hardware costs and reducing GPU dependency.