[P] ColQwen3.5-v2 4.5B is out!
The 4.5B parameter model achieves a 0.6177 nDCG@10 score on ViDoRe V3, claiming the top spot.
Athrael-Soju has released ColQwen3.5-v2, a 4.5B parameter model for visual document retrieval (VDR), which directly tackles the challenge of searching through scanned PDFs, images of forms, and other non-textual documents. Built on the Qwen3.5-4B foundation and employing the ColPali late-interaction architecture, the model has taken the number one position on the ViDoRe V3 benchmark with an nDCG@10 score of 0.6177. It also shows strong performance on the ViDoRe V1 benchmark, scoring 0.9172 nDCG@5, making it the top model in the 4B parameter class. This release narrows the performance gap with the leading TomoroAI model on a key metric from 0.010 to just 0.002.
The key advancement in v2 is a significantly streamlined and more effective training process. The team moved from a complex four-phase recipe to a simpler two-phase approach. Critical domain data for finance and tabular information was integrated from the start, and a single set of 'hard negatives' (challenging examples for the model to distinguish) was mined and reused. The final model is a weighted ensemble, 'souped' from the new v2 and the previous v1 at a 55/45 ratio. Remarkably, this simpler method, using fewer random seeds (3 vs. 4), yielded better results. The model is released under the permissive Apache 2.0 license and is available for download on Hugging Face.
- Achieves top leaderboard score of 0.6177 nDCG@10 on the ViDoRe V3 visual document retrieval benchmark.
- Uses a simplified two-phase training recipe with pre-baked finance/table data, reducing complexity from four phases.
- Released as Apache 2.0 on Hugging Face, enabling free commercial use for document search applications.
Why It Matters
Enables powerful, open-source search for invoices, reports, and forms within image-based documents, automating manual data extraction.