Research & Papers

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

arXiv cs.CL May 08, 2026

⚡LLMs finally learn when to shut up and when to jump in.

Deep Dive

When2Speak is a new grounded synthetic dataset and four-stage generation pipeline built to teach large language models a crucial but often neglected skill: deciding when to speak in multi-party conversations. Created by researchers (Vihaan Nama, Shreya Mendi, Zian Ye, Brinnae Bent), the dataset contains over 215,000 examples derived from 16,000 conversations with 2–6 speakers, covering diverse tones, styles, and group dynamics. Each example explicitly models SPEAK vs. SILENT decisions per turn, providing fine-tuning-ready supervision. The pipeline combines real-world grounding, structured augmentation, controlled transcript synthesis, and supervision generation, and is fully open-sourced.

When evaluated across multiple model families (4B+ parameters), supervised fine-tuning on When2Speak significantly outperformed zero-shot baselines: average Macro F1 increased by 60%, with the largest gain reaching 120%. However, SFT-trained models were systematically over-conservative, missing nearly half of warranted interventions (Missed Intervention Rate ~0.50). To address this, the team applied reinforcement learning with asymmetric reward shaping, which reduced MIR to 0.186–0.218 and increased recall from 0.479 to 0.78–0.81. The findings establish temporal participation as a distinct, trainable dimension of conversational intelligence, with synthetic data providing a scalable path toward more natural multi-party interaction for LLMs.

Key Points

Dataset includes 215k+ examples from 16k conversations with 2–6 speakers, explicitly modeling SPEAK vs. SILENT decisions.
Supervised fine-tuning improved Macro F1 by 60% on average (up to 120%) across 4B+ parameter models.
Reinforcement learning with asymmetric reward shaping reduced Missed Intervention Rate from 0.50 to 0.186–0.218, boosting recall to 0.78–0.81.

Why It Matters

Turns LLMs from interruptive chatterboxes into natural group conversation participants, unlocking better AI assistants for meetings and collaboration.

Read Original Article

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

Why It Matters

Stay Ahead in AI