FSA-GRPO teaches auditory LLMs to learn from few examples
New RL method boosts speech recognition for kids without training on children's data
Few-shot prompting is a powerful way to adapt large language models to new tasks, but most auditory LLMs are not explicitly trained to use those demonstrations effectively. To close this gap, researchers from the University of Illinois at Urbana-Champaign (UIUC) introduce Few-Shot Aware GRPO (FSA-GRPO), a reinforcement learning post-training method that applies Group Relative Policy Optimization (GRPO) with a reward function specifically designed to encourage the model to attend to and leverage few-shot examples. This makes the model's few-shot adaptation capability much stronger without requiring task-specific fine-tuning.
Remarkably, training FSA-GRPO solely on high-resource adult automatic speech recognition (ASR) data generalizes to multiple low-resource tasks. The method yields significant gains not only in children's speech recognition (the target low-resource scenario) but also in speech translation and general audio understanding. The authors also studied optimal data selection and auxiliary reward weighting. Their experiments show that when in-domain data is unavailable or cannot be used, FSA-GRPO is more effective than directly fine-tuning on related out-of-domain data, making it a practical solution for real-world deployment where labeled data is scarce.
- Uses reinforcement learning (GRPO) with a custom reward to teach auditory LLMs to leverage few-shot demonstrations
- Trained only on high-resource adult ASR data, yet improves children's speech, speech translation, and audio understanding
- Outperforms direct fine-tuning on related out-of-domain data when in-domain data is unavailable
Why It Matters
Enables auditory LLMs to adapt to low-resource tasks without costly in-domain data collection, saving time and money.