LM Studio 0.4.14 adds MTP speculative decoding for faster local LLMs
Speculative decoding speeds up inference by predicting multiple tokens at once.
LM Studio has finally added support for MTP (Multi-Token Prediction) speculative decoding in its latest beta release (version 0.4.14 Build 2). This feature, which is off by default, allows users to speed up local LLM inference by predicting multiple tokens simultaneously using a draft model. To enable it, users must update their llama.cpp engine to version 2.15.0 and manually select "MTP" under "Manually choose model load parameters" before loading a model.
The update brings a well-known speculative decoding technique to LM Studio's user-friendly interface. MTP works by having a smaller, faster draft model propose several token candidates at once; the main model then verifies them in a single forward pass. This can reduce the number of autoregressive steps, cutting latency dramatically — especially useful for real-time chat and coding assistants running locally. The feature is currently in beta, so performance may vary depending on the model pair and hardware.
- LM Studio 0.4.14 Build 2 (Beta) adds MTP speculative decoding support
- Requires llama.cpp engine 2.15.0 and manual enabling in model load parameters
- Speeds up inference by predicting multiple tokens in parallel, reducing latency
Why It Matters
Faster local AI inference means more responsive apps, lower costs, and better real-time experiences.