Research & Papers

New mechanism ensures truthful worker feedback for LLM fine-tuning in mobile crowdsourcing

Strategic workers lie for payment? This new algorithm reduces regret from O(T) to O(√T).

Deep Dive

Mobile crowdsourcing platforms, such as those for navigation and traffic prediction, increasingly rely on LLM-generated content that must be aligned with human preferences. However, workers (mobile users) may strategically misreport their preferences to maximize influence or payment. Existing pipelines, like EM-based weight estimation, fail to identify the most accurate worker in this online setting, leading to linear regret O(T) over T time slots.

To solve this, Hao and Duan formulate a dynamic Bayesian game to model the multi-agent online learning interaction between the platform and strategic workers. They propose a novel online weighted aggregation mechanism that dynamically adjusts each worker's weight according to their feedback accuracy. The mechanism provably ensures truthful feedback from workers and achieves sublinear regret O(√T). An extension to scenarios with limited per-slot feedback also guarantees O(√T) regret. Experiments on LLM fine-tuning with real-world data demonstrate significant gains over benchmark methods.

Key Points
  • Workers can strategically misreport preferences to maximize payment/influence, causing linear regret in traditional EM-based methods.
  • The new dynamic Bayesian game model ensures truthful reporting by dynamically weighting workers based on feedback accuracy.
  • Achieves sublinear regret O(√T) even in challenging scenarios with limited worker feedback per time slot.

Why It Matters

Makes LLM fine-tuning in mobile crowdsourcing more efficient and fair, reducing cost and improving alignment for real-world apps.