Language Model Representations for Efficient Few-Shot Tabular Classification
A new technique makes existing LLMs like GPT-4 competitive with specialized models on web tables using just 32 examples.
Researchers from IBM and RPI developed TaRL (Table Representation with Language Model), a lightweight method for few-shot classification of web tables (like product catalogs). It uses two key techniques—removing common embedding components and calibrating softmax temperature—to make standard LLM embeddings perform comparably to state-of-the-art tabular models in low-data regimes (k ≤ 32). This allows companies to reuse existing LLM infrastructure for structured data tasks without costly retraining.
Why It Matters
Enables efficient classification of product catalogs and scientific data using existing AI infrastructure, reducing need for specialized models.