vLLM ROCm has been added to Lemonade as an experimental backend
Now run LLMs directly on AMD GPUs without GGUF conversion.
Lemonade, the open-source AI inference toolkit, has added experimental support for vLLM with ROCm (AMD's GPU compute stack). This allows users to run .safetensors models—the raw format used by many modern LLMs—on AMD hardware without first converting them to GGUF. The integration was contributed by community members u/krishna2910-amd, u/mikkoph, and u/sa1sr1, making it as easy as using llama.cpp within Lemonade.
The backend is labeled experimental: core functionality works, but known rough edges exist. Users can try it immediately with commands like 'lemonade backends install vllm:rocm' followed by 'lemonade run Qwen3.5-0.8B-vLLM'. The Lemonade team is actively seeking feedback to gauge interest and guide further development. A quick start guide and GitHub repo provide full details for those wanting to test and contribute.
- vLLM ROCm backend enables running .safetensors models on AMD GPUs without GGUF conversion.
- Installation is a single command: 'lemonade backends install vllm:rocm'.
- Experimental release seeks community feedback to determine future investment.
Why It Matters
Brings faster LLM experimentation to AMD GPU users, reducing friction in pre-deployment testing.