New mechanism ensures truthful worker feedback for LLM fine-tuning in mobile crowdsourcing
Strategic workers lie for payment? This new algorithm reduces regret from O(T) to O(√T).
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Mobile crowdsourcing platforms, such as those for navigation and traffic prediction, increasingly rely on LLM-generated content that must be aligned with human preferences. However, workers (mobile users) may strategically misreport their preferences to maximize influence or payment. Existing pipelines, like EM-based weight estimation, fail to identify the most accurate worker in this online setting, leading to linear regret O(T) over T time slots.
To solve this, Hao and Duan formulate a dynamic Bayesian game to model the multi-agent online learning interaction between the platform and strategic workers. They propose a novel online weighted aggregation mechanism that dynamically adjusts each worker's weight according to their feedback accuracy. The mechanism provably ensures truthful feedback from workers and achieves sublinear regret O(√T). An extension to scenarios with limited per-slot feedback also guarantees O(√T) regret. Experiments on LLM fine-tuning with real-world data demonstrate significant gains over benchmark methods.
- Workers can strategically misreport preferences to maximize payment/influence, causing linear regret in traditional EM-based methods.
- The new dynamic Bayesian game model ensures truthful reporting by dynamically weighting workers based on feedback accuracy.
- Achieves sublinear regret O(√T) even in challenging scenarios with limited worker feedback per time slot.
Why It Matters
Makes LLM fine-tuning in mobile crowdsourcing more efficient and fair, reducing cost and improving alignment for real-world apps.