Open Source

ibm-granite/granite-4.0-3b-vision · Hugging Face

IBM's new 3B-parameter vision model converts charts to code and tables to JSON with enterprise precision.

Deep Dive

IBM has launched Granite-4.0-3B-Vision, a specialized 3-billion parameter vision-language model (VLM) designed to tackle complex document extraction tasks that often challenge smaller models. The model excels in three key areas: converting charts into structured formats like CSV, code, or summaries; accurately extracting tables with intricate layouts into JSON, HTML, or OTSL; and performing semantic key-value pair (KVP) extraction based on key names and descriptions across varied document designs. This focus makes it an enterprise-grade tool for automating data capture from financial reports, scientific papers, and business documents.

Architecturally, the model is delivered as a LoRA (Low-Rank Adaptation) adapter on top of the existing Granite 4.0 Micro base model. This clever setup enables a single deployment to handle both multimodal document understanding and text-only workloads—the base model processes text requests without loading the vision adapter, optimizing resource use. It builds upon the capabilities of its predecessor, Granite-Vision-3.3 2B, ensuring backward compatibility for existing users. The model can be used standalone or seamlessly integrated with IBM's Docling toolkit to supercharge document processing pipelines with advanced visual comprehension, moving beyond simple OCR to genuine understanding of document structure and content.

Key Points
  • Specializes in chart-to-CSV/code, complex table-to-JSON extraction, and semantic key-value pair pulling from diverse documents.
  • Uses a LoRA adapter on the Granite 4.0 Micro base, allowing one deployment for both vision and text-only tasks.
  • Integrates with Docling to add deep visual understanding to document processing pipelines, preserving backward compatibility.

Why It Matters

Automates the tedious, error-prone task of manually extracting data from complex charts and tables in business and research documents.