Research & Papers

TurboOCR: 270–1200 img/s OCR with Paddle + TensorRT (C++/CUDA, FP16) [P]

Achieve 1,200 images per second with the new TurboOCR technology.

Deep Dive

TurboOCR, created by AIPtimizer, is an advanced OCR tool that leverages C++/CUDA and FP16 TensorRT to optimize performance significantly. Unlike PaddleOCR, which processes around 15 images per second on high-end GPUs, TurboOCR achieves remarkable speeds of up to 1,200 images per second on sparse pages. This is accomplished through a multi-stream pipeline, batched recognition, and kernel fusion techniques, making it a game-changer for handling large-scale document processing tasks efficiently. TurboOCR supports various input formats via HTTP/gRPC, returning bounding boxes, text, and layout regions on demand.

The tool is particularly beneficial for professionals requiring real-time indexing of documents in retrieval-augmented generation (RAG) scenarios or those managing extensive PDF collections. Despite its speed, TurboOCR faces challenges with complex table extraction and structured outputs, which still necessitate the capabilities of VLM-based OCR solutions like PaddleOCR-VL. AIPtimizer is actively working on expanding TurboOCR's features, including structured extraction and support for multiple languages, while striving to maintain its high-speed performance. Tested on Linux with RTX 50-series GPUs and CUDA 13.2, TurboOCR represents a significant advancement in the OCR landscape, providing a cost-effective solution for high-volume processing needs.

Key Points
  • Processes up to 1,200 images per second on sparse pages using C++/CUDA.
  • Utilizes FP16 TensorRT for enhanced performance over traditional OCR methods.
  • Ideal for real-time RAG applications and bulk document processing.

Why It Matters

TurboOCR significantly reduces OCR processing time, enhancing productivity for data-heavy industries.