Developer Tools

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Open-source local AI server loads 120B parameter models and handles 64K+ context windows.

Deep Dive

AMD has launched Lemonade, a fast, open-source local LLM server engineered to harness both GPU and NPU hardware for accelerated AI inference. Built by the local AI community, the lightweight 2MB C++ backend is designed for practical workflows, offering a one-minute install and automatic hardware configuration. It supports running multiple large models simultaneously—including 120B+ parameter ones like GPT-OSS-120B—and can utilize up to 128GB of unified RAM. The server is fully compatible with the OpenAI API standard, allowing it to work out-of-the-box with hundreds of existing applications like Open WebUI, n8n, and GitHub Copilot.

Lemonade provides a unified local service for every AI modality, including chat, vision, image generation, transcription, and speech synthesis through standard API endpoints. It features a built-in GUI for quickly downloading and switching models and supports advanced configurations like increasing context windows to 64K+ tokens. The platform is cross-platform, offering a consistent experience on Windows, Linux, and macOS (beta), and is built on top of leading inference engines like llama.cpp and Ryzen AI software. By making local AI free, open, fast, and private, Lemonade represents a significant step toward democratizing high-performance, on-device AI for every PC.

Key Points
  • Leverages both GPU and NPU for accelerated inference, running models 2.5x faster locally
  • Open-source 2MB C++ server supports 120B+ parameter models and 64K+ context windows
  • OpenAI API compatible, works with hundreds of apps out-of-the-box and auto-configures for hardware

Why It Matters

Democratizes high-performance, private AI by enabling anyone to run advanced models locally without cloud dependency.