Algorithm Selection with Zero Domain Knowledge via Text Embeddings
No domain knowledge needed: text embeddings alone outperform expert features.
Stefan Szeider's new paper, "Algorithm Selection with Zero Domain Knowledge via Text Embeddings," introduces ZeroFolio, a method that eliminates the need for hand-crafted features in algorithm selection. Instead, it uses a three-step pipeline: serialize the raw instance file as plain text, embed it with a pretrained model, and select an algorithm via weighted k-nearest neighbors. The key insight is that pretrained embeddings produce meaningful representations of problem instances without any domain-specific training, enabling the same pipeline to work across diverse domains.
Evaluated on 11 ASlib scenarios spanning 7 domains (SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems), ZeroFolio outperformed a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting. The margin was often substantial. An ablation study identified inverse-distance weighting, line shuffling, and Manhattan distance as critical design choices. Combining embeddings with hand-crafted features via soft voting yielded further improvements, suggesting a hybrid approach may be optimal in some cases.
- ZeroFolio uses pretrained text embeddings to replace hand-crafted features for algorithm selection.
- Outperformed random forest models in 10/11 scenarios with one seed, 11/11 with two-seed voting.
- Tested across 7 domains: SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems.
Why It Matters
ZeroFolio automates algorithm selection across domains, saving experts time and improving performance without manual feature engineering.