Research & Papers

RAPT: New wrapper boosts multi-label classification 2x over few-shot LLMs

0.87 Macro-F1, 115x faster — no retraining needed.

Deep Dive

RAPT (Retrieval-Augmented Post-hoc Thresholding) addresses a core challenge in industrial multi-label document classification: how to dynamically set label selection thresholds per document without retraining the underlying model. Standard global thresholds fail under OCR noise, label imbalance, and evolving document formats. RAPT works as a model-agnostic wrapper that, for each query document, retrieves similar labeled documents and adapts the threshold using their known outcomes — e.g., by averaging label counts or calibrating cutoff scores. This lets any predictor (metric learning encoders or fine-tuned transformers) produce better label sets post-hoc.

Evaluated on one industrial dataset and six public benchmarks, RAPT consistently outperformed global and label-wise static thresholds. Best results came from metric learning classifiers hitting 0.87 Macro-F1 on the industrial corpus, while fine-tuned transformers averaged 0.775 Macro-F1. Crucially, RAPT beat few-shot LLM baselines (K=5) by 2x in accuracy while being 115x faster in inference and using 13.5x less GPU memory — a massive efficiency gain for production pipelines.

Key Points
  • RAPT is model-agnostic: works with any classifier that provides document embeddings and confidence scores.
  • Achieved 0.87 Macro-F1 on industrial data with metric learners, outperforming few-shot LLMs by 2x.
  • Reduces inference time by 115x and GPU memory by 13.5x vs. few-shot LLM baselines.

Why It Matters

Enables accurate multi-label classification without costly retraining or LLM inference, ideal for high-volume document pipelines.