Numind releases NuExtract3: open 4B VLM for document extraction
New open-weight model handles PDFs, invoices, and tables locally.
Numind, the company behind the open-weight model, has released NuExtract3, a 4B parameter vision-language model based on Qwen3.5-4B and licensed under Apache-2.0. Designed for practical information extraction, it handles complex visually structured inputs including PDFs, screenshots, forms, tables, receipts, invoices, and multi-page documents. The model can convert document images to Markdown and extract structured data using a target JSON template, making it a versatile tool for document processing pipelines. It was trained on a node of 8xH100 for three days, allowing it to handle long documents effectively, though for Markdown tasks page-by-page processing is recommended for optimal speed.
NuExtract3 is easy to self-host with comprehensive documentation and multiple weight formats including Safetensors, GGUF, and MLX. It requires as little as 4GB of VRAM and supports various quantizations (GPTQ, W8A8, FP8, Q4, Q6). The model works well with vLLM, SGLang, and llama.cpp, offering a local/open-weight alternative for document extraction. Numind provides a free Hugging Face space for testing without sign-up, and a blog post with detailed model card. The source code and weights are available on Hugging Face, and a peer-reviewed paper is forthcoming.
- 4B parameter VLM based on Qwen3.5-4B with Apache-2.0 license
- Converts document images to Markdown and extracts structured data via JSON templates
- Self-hostable with 4GB VRAM; supports GPTQ, GGUF, MLX, and multiple quantizations
Why It Matters
Enables local, private document extraction without cloud services, ideal for sensitive data workflows.