NovaLAD: A Fast, CPU-Optimized Document Extraction Pipeline for Generative AI and Data Intelligence
Aman Ulla's new system beats commercial parsers on DP-Bench while running entirely on CPU hardware.
Researcher Aman Ulla has published a paper introducing NovaLAD, a comprehensive document parsing system designed to transform unstructured documents like PDFs and scans into structured, layout-aware representations for generative AI and data intelligence workflows. The system addresses a critical bottleneck in retrieval-augmented generation (RAG) and knowledge base construction by providing a fast, CPU-optimized pipeline that integrates two concurrent YOLO object detection models—one for semantic elements and another for structural layout—alongside rule-based grouping and optional vision-language enhancement. A key innovation is its cost-saving approach to image analysis, where a Vision Transformer (ViT) classifier first filters out irrelevant images before only useful ones are sent to a Vision LLM for title generation and summarization.
The architecture is built for parallel execution on CPU, handling detection, classification, OCR, and conversion simultaneously to output multiple formats including structured JSON, Markdown, and knowledge graphs. On the DP-Bench benchmark (upstage/dp-bench), NovaLAD achieved a 96.49% TEDS (Table Extraction Dataset Score) and 98.51% NID (Normalized Information Distance), outperforming both commercial and open-source parsers. This performance, combined with its GPU-free design, makes it a highly accessible and efficient tool for enterprises looking to preprocess large document volumes for downstream AI applications without significant hardware investment. The paper details the full data flow and architecture, positioning NovaLAD as a practical solution for scalable document intelligence.
- Uses dual YOLO models for concurrent element & layout detection, achieving 96.49% TEDS on DP-Bench
- CPU-optimized pipeline eliminates GPU dependency, enabling parallel execution of detection, OCR, and conversion
- Integrates ViT classifier to filter images, reducing Vision LLM costs by only processing relevant visual content
Why It Matters
Enables efficient, large-scale document preprocessing for RAG and knowledge bases without expensive GPU hardware, lowering AI implementation costs.