b8095
The latest commit patches a critical dispatch error for matrix-vector multiplication on GPUs.
The open-source project Llama.cpp, maintained by ggml-org, released commit b8095. It fixes a bug in dispatching large matrix-vector multiplication operations for its WebGPU backend. This patch ensures stable performance for running large language models (LLMs) like Llama 3 on Apple Silicon (macOS/iOS) and other platforms, including Windows with CUDA and Vulkan support. Users can now run AI models more reliably on local GPUs without this specific computational error.
Why It Matters
This fix is crucial for developers running high-performance LLMs locally, ensuring stable and efficient GPU acceleration across platforms.