A new method makes AI training cheaper and more stable
Researchers find a smarter way to train AI models, cutting costs without sacrificing performance.
Training large AI models with reinforcement learning is very expensive. The new 'Jackpot' framework tackles this by using a cheaper, separate model to generate training data, which normally causes instability. It employs a smart sampling technique to align the data from the cheaper model with the main AI's goals. In tests on a Qwen3-8B model, it matched the performance of far more expensive on-policy training for hundreds of update steps.
Why It Matters
This could significantly reduce the high computational cost of developing advanced AI systems.