In-Context Positive-Unlabeled Learning
New transformer handles PU classification without retraining on each dataset.
Positive-unlabeled (PU) learning tackles binary classification when only a set of labeled positives is available alongside a pool of unlabeled samples. Traditional methods require dataset-specific training or iterative optimization, making them slow for many tasks. Now, Siyan Liu and colleagues present PUICL (Positive-Unlabeled In-Context Learning), a pretrained transformer that solves PU classification entirely through in-context learning. PUICL is pretrained on synthetic PU datasets generated from randomly instantiated structural causal models, exposing it to diverse feature-label relationships and class-prior configurations. At inference, it receives labeled positives and unlabeled samples as one input and returns class probabilities for unlabeled rows in a single forward pass—no gradient updates or per-task fitting needed.
Tested on 20 semi-synthetic PU benchmarks derived from the UCI Machine Learning Repository, OpenML, and scikit-learn, PUICL outperforms four standard PU learning baselines in average AUC and accuracy, and is competitive on F1-score. This work demonstrates that in-context learning extends naturally beyond fully supervised tabular prediction to semi-supervised PU settings, enabling rapid deployment on new PU tasks without dataset-specific tuning. The approach promises efficiency gains for applications like anomaly detection, medical diagnosis, and fraud detection where labeled positives are scarce.
- PUICL requires only one forward pass with no gradient updates per task.
- Outperforms 4 standard PU baselines on 20 semi-synthetic benchmarks (UCI, OpenML, scikit-learn).
- Pretrained on synthetic data from randomly instantiated structural causal models.
Why It Matters
Enables rapid PU classification on new tasks without per-dataset training or tuning.