4B parameter VLM based on Qwen3.5-4B with Apache-2.0 license?

4B parameter VLM based on Qwen3.5-4B with Apache-2.0 license

Converts document images to Markdown and extracts structured data via JSON templates?

Converts document images to Markdown and extracts structured data via JSON templates

Self-hostable with 4GB VRAM; supports GPTQ, GGUF, MLX, and multiple quantizations?

Self-hostable with 4GB VRAM; supports GPTQ, GGUF, MLX, and multiple quantizations

Research & Papers

Numind releases NuExtract3: open 4B VLM for document extraction

r/MachineLearning May 22, 2026

⚡New open-weight model handles PDFs, invoices, and tables locally.

Deep Dive

Numind, the company behind the open-weight model, has released NuExtract3, a 4B parameter vision-language model based on Qwen3.5-4B and licensed under Apache-2.0. Designed for practical information extraction, it handles complex visually structured inputs including PDFs, screenshots, forms, tables, receipts, invoices, and multi-page documents. The model can convert document images to Markdown and extract structured data using a target JSON template, making it a versatile tool for document processing pipelines. It was trained on a node of 8xH100 for three days, allowing it to handle long documents effectively, though for Markdown tasks page-by-page processing is recommended for optimal speed.

NuExtract3 is easy to self-host with comprehensive documentation and multiple weight formats including Safetensors, GGUF, and MLX. It requires as little as 4GB of VRAM and supports various quantizations (GPTQ, W8A8, FP8, Q4, Q6). The model works well with vLLM, SGLang, and llama.cpp, offering a local/open-weight alternative for document extraction. Numind provides a free Hugging Face space for testing without sign-up, and a blog post with detailed model card. The source code and weights are available on Hugging Face, and a peer-reviewed paper is forthcoming.

Key Points

4B parameter VLM based on Qwen3.5-4B with Apache-2.0 license
Converts document images to Markdown and extracts structured data via JSON templates
Self-hostable with 4GB VRAM; supports GPTQ, GGUF, MLX, and multiple quantizations

Why It Matters

Enables local, private document extraction without cloud services, ideal for sensitive data workflows.

Read Original Article

Numind releases NuExtract3: open 4B VLM for document extraction

Why It Matters

Related Articles

🚀 Stay Ahead in AI