Developer Tools

b7990

The popular open-source inference engine just got a major new model family.

Deep Dive

The llama.cpp project, a leading open-source inference engine for running LLMs locally, has released version b7990 with official support for Alibaba's Qwen 3.5 model series. This update allows developers and users to run the powerful Qwen models efficiently on their own hardware across macOS, Linux, and Windows, including support for CUDA, Vulkan, and Apple Silicon. The release also includes code cleanup and removes the DeepStack feature for now.

Why It Matters

This significantly expands the accessible local AI toolkit, letting users benchmark and deploy a top-performing open model series with greater efficiency.