Research & Papers

Internalized Reasoning for Long-Context Visual Document Understanding

arXiv cs.CV April 06, 2026

⚡New method makes AI reason internally on long documents, cutting output tokens by over 12x.

Deep Dive

Researcher Austin Veselka has published a novel AI training technique called 'Internalized Reasoning for Long-Context Visual Document Understanding.' The core innovation is a synthetic data pipeline that teaches AI models to perform reasoning internally before generating a final answer. The pipeline works by having a model score document pages for relevance to a question, extract textual evidence, and order that evidence. This 'thinking trace' is then used to fine-tune models using Supervised Fine-Tuning (SFT) within special <think> tags, controlled by a <cot> (chain-of-thought) token. The final reasoning capability is 'internalized' using a low-strength model merging technique.

The method was tested on two prominent open-source vision-language models: Qwen3 VL 32B and Mistral Small 3.1 24B. The results are significant. The enhanced 32-billion-parameter Qwen3 model scored 58.3 on the MMLongBenchDoc benchmark, surpassing the score of a vastly larger 235-billion-parameter version of Qwen3 (57.0). With Mistral, the synthetic reasoning approach outperformed a simpler distillation method by 3.8 points. Crucially, this internalized reasoning leads to dramatically more efficient outputs: models using this method produced a mean of 12.4 times fewer output tokens compared to models that reason explicitly step-by-step in their response. The entire pipeline has been released publicly for reproducibility.

Key Points

A 32B parameter Qwen3 VL model, trained with Internalized Reasoning, outperformed a 7x larger 235B model on the MMLongBenchDoc benchmark (58.3 vs 57.0).
The technique reduces verbose reasoning, resulting in AI outputs that are 12.4 times more concise on average compared to standard chain-of-thought methods.
The synthetic training pipeline is open-source, enabling others to apply internalized reasoning to different models and long-context tasks.

Why It Matters

Enables faster, cheaper AI for analyzing lengthy legal, scientific, and business documents without sacrificing accuracy.

Read Original Article

Internalized Reasoning for Long-Context Visual Document Understanding

Why It Matters

Stay Ahead in AI