2b or not 2b ? Custom LLM Scheduling Competition [P]
A new competition asks participants to decide when to run a 2B-parameter model or skip it entirely to minimize costs.
A novel Kaggle competition is tackling a fundamental challenge in AI deployment: cost-efficient inference. Launched by an independent researcher, the "LLM Scheduling Competition" presents a simplified but critical problem. Participants are given questions from the popular MMLU (Massive Multitask Language Understanding) benchmark. Instead of answering them, their algorithm must decide on a scheduling action: either run a small 2-billion-parameter (2B) model to attempt an answer, or skip the question entirely.
The core of the challenge is a custom cost-based scoring metric. Running the model consumes compute resources, which incurs a cost. However, skipping a question that the small model could have answered correctly is also penalized, as is running the model when it fails. The objective is to minimize the total weighted cost, forcing participants to develop smart classifiers or rule-based systems that predict when the small model is likely to succeed. This competition is a first step toward more complex, real-world scheduling where decisions involve multiple models of varying sizes, speeds, and costs.
Currently, the setup is intentionally simple, with a fixed cost for running the model. However, it establishes a crucial framework for a major industry problem. As organizations deploy LLMs at scale, blindly using the most powerful (and expensive) model for every query is unsustainable. Intelligent routing—sending easy queries to small, fast models and reserving large models for hard problems—is essential for reducing operational expenses. This competition crowdsources innovative approaches to this routing logic, exploring everything from simple heuristics to potentially creative ML-based classifiers.
- Competition uses MMLU benchmark questions, requiring a scheduler to choose between running a 2B model or skipping.
- Scoring is based on a weighted cost metric that penalizes both unnecessary compute and missed answer opportunities.
- Aims to pioneer methods for intelligent query routing, a key to affordable large-scale AI deployment.
Why It Matters
Solving intelligent model routing is essential for making powerful AI applications cost-effective and scalable for businesses.