Developer Tools

v4.1

The popular open-source LLM UI now lets models call Python functions and fully complies with OpenAI's API spec.

Deep Dive

Oobabooga has launched version 4.1 of text-generation-webui, the widely-used open-source interface for running large language models locally. The headline feature is native tool-calling, allowing models to execute custom Python functions defined in simple .py files. Five example tools are included: web_search, fetch_webpage, calculate, get_datetime, and roll_dice. During streaming chats, tool calls appear as collapsible accordions, showing the function, arguments, and output, mirroring capabilities previously seen in commercial APIs like OpenAI's.

The update brings massive improvements to API compatibility, making the local server fully compliant with the OpenAI API specification. This includes proper support for logprobs across backends (llama.cpp, ExLlamaV3, Transformers), a dedicated reasoning_content field for thinking blocks, and correct handling of tool_calls and tool_choice parameters. For users, this means client applications built for OpenAI's API can now seamlessly switch to a local model. Performance enhancements include optimized DOM updates for smoother streaming, a new 'incognito chat' mode that lives only in RAM, and a context size slider increased to 1 million tokens.

Under the hood, v4.1 refactors reasoning extraction into a standalone module supporting multiple model formats (Qwen, GPT-OSS, Solar, seed:think). It removes legacy rope scaling parameters for modern 128k+ context models, sets smarter defaults for llama.cpp, and introduces a new 'Top-P' sampler preset. The one-click installer is also optimized to download only changed components, speeding up updates. This release solidifies text-generation-webui as a production-ready platform for deploying and experimenting with open-weight LLMs.

Key Points
  • Native tool-calling: Models can now execute custom Python functions (web_search, calculate, etc.) defined in /tools folder, with UI elements displaying calls in real-time.
  • Full OpenAI API spec compliance: Adds logprobs support, proper tool_calls format, reasoning_content field, and stream_options, enabling drop-in replacement for OpenAI endpoints.
  • New 'incognito chat' mode and 1M token context slider: Adds temporary RAM-only chats and increases maximum context length support in the UI.

Why It Matters

This turns local LLMs into actionable agents and provides a fully compatible, private alternative to commercial API services for developers.