Release v5.6.0
The latest release introduces four new specialized models for document AI, privacy, segmentation, and table recognition.
Hugging Face has launched Transformers library version 5.6.0, a significant update introducing four new specialized AI models. The standout addition is Baidu's Qianfan-OCR, a 4-billion parameter end-to-end document intelligence model that performs direct image-to-text conversion, eliminating traditional multi-stage OCR pipelines. It supports prompt-driven tasks like table extraction, chart understanding, and document Q&A through a unique "Layout-as-Thought" capability. Another major inclusion is OpenAI's Privacy Filter, a bidirectional token-classification model designed for on-premises, high-throughput detection and masking of personally identifiable information (PII) across eight privacy categories.
The release also features two efficiency-focused models: SAM3-LiteText, a lightweight variant that reduces text encoder parameters by 88% for vision-language segmentation, and SLANet, a CPU-friendly model from Baidu's PaddlePaddle team for fast table structure recognition. Beyond new models, v5.6.0 brings breaking changes to the internal `rotary_fn` and major enhancements to the `transformers serve` command. The serving updates include a new `/v1/completions` endpoint for legacy OpenAI-style completions, multimodal support for audio and video inputs, improved tool-calling via `parse_response`, and better error handling for model mismatches.
- Adds Baidu's Qianfan-OCR, a 4B-parameter model for unified document parsing and image-to-text conversion.
- Introduces OpenAI Privacy Filter for on-premises, high-speed PII detection and masking across 8 categories.
- Enhances `transformers serve` with a legacy completions endpoint and multimodal audio/video input support.
Why It Matters
Provides developers with production-ready, specialized models for document intelligence, data privacy, and efficient multimodal tasks directly within the popular Transformers ecosystem.