Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
IBM's new 3-billion parameter VLM uses a novel 'DeepStack' architecture to parse complex documents with high accuracy.
IBM has released Granite 4.0 3B Vision, a compact 3-billion parameter vision-language model (VLM) purpose-built for enterprise document understanding. The model is designed to reliably extract information from complex documents, forms, and structured visuals, with specific capabilities in table extraction, chart understanding, and semantic key-value pair parsing. A key innovation is its modular design: it ships as a LoRA adapter on top of the text-only Granite 4.0 Micro model, allowing the same deployment to handle both multimodal and text-only workloads seamlessly.
Performance is driven by three core technical investments. First is ChartNet, a novel million-scale multimodal dataset for chart interpretation, containing 1.7 million diverse synthetic chart samples with aligned components like plotting code and data tables. Second is a novel variant of the DeepStack architecture, which injects abstract visual features into earlier model layers for semantic understanding and high-resolution spatial features into later layers to preserve critical layout details. This approach enables the model to understand both the 'what' and the 'where' in a document, which is essential for tasks like accurately reading values from a line chart or parsing a multi-column table.
- Modular LoRA adapter design allows seamless switching between vision-language and text-only tasks using the same Granite 4.0 Micro base.
- Trained on the new 'ChartNet' dataset of 1.7 million synthetically generated charts for superior chart-to-code and data extraction.
- Uses a novel 'DeepStack' architecture for smarter visual feature injection, separating high-level semantics from fine-grained spatial details.
Why It Matters
Enables enterprises to automate the extraction of structured data from complex reports, financial charts, and forms with a single, efficient model.