v4.7.2
Portable Electron app replaces scripts; tensor parallelism speeds up multi-GPU by 60%+
oobabooga's text-generation-webui (46.9k stars) just dropped v4.7.2 with several major upgrades. The headline feature is a native desktop application: portable builds now bundle Electron and open as a standalone window, replacing the old start scripts. Users can run textgen/textgen.bat (Windows) or skip the window with --listen or --nowebui. The UI has been overhauled—switching to Inter font, Lucide SVG icons, a redesigned chat input card, and a three-button segmented control for chat modes. Sidebar toggle buttons are replaced with thin hairline handles, and the active tab now uses a flat underline indicator.
The biggest performance improvement comes from tensor parallelism for the llama.cpp backend. A new --split-mode flag with a tensor option can make multi-GPU inference over 60% faster compared to the old row-split method. On the ik_llama.cpp fork, tensor and row fall back to graph mode. The update also swaps DuckDuckGo HTML scraping for the more robust ddgs library, adds support for standalone .jinja/.jinja2 template files, and fixes bugs like ignored Stop button during tool call approval and race conditions in the ExLlamaV3 backend. Portable builds for Windows, Linux, and macOS cover NVIDIA CUDA (12.4/13.1), AMD/Intel Vulkan, AMD ROCm 7.2, and CPU-only, making local LLM deployment easier than ever.
- Native desktop app using Electron portable builds—no more manual start scripts; just run textgen.bat
- Tensor parallelism for llama.cpp via --split-mode tensor delivers 60%+ faster multi-GPU inference
- UI overhaul with Inter font, Lucide icons, redesigned chat input, and improved tab indicators
Why It Matters
Makes running LLMs locally much more accessible and performant, especially for multi-GPU setups.