Research & Papers

Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact

New study shows LLMs can code 11,647 dialogue utterances with 6 constructs...

Deep Dive

Researchers from Monash University and partner institutions have published a study on arXiv (2604.23255) exploring how LLMs can automate dialogue coding in healthcare simulation debriefing. Using a dataset of 11,647 utterances coded across 6 dialogue constructs, they compared 4 prompt designs across varying batch sizes. The study evaluates coding performance, processing time, and energy consumption, finding that increasing batch size improves speed and reduces energy use but negatively impacts coding accuracy. This work demonstrates the feasibility of LLM-based qualitative analysis for real-time feedback in training environments.

The research addresses a critical gap: prior LLM coding studies focused on replicating human accuracy for research, not real-time use in settings like healthcare simulations where results must be fast, private, and sustainable. By optimizing prompt design and batching, the team shows how to balance these competing demands. The findings offer practical guidance for scaling dialogue analytics in contexts where timeliness, privacy, and sustainability are critical, paving the way for AI-powered feedback in team training.

Key Points
  • Tested 4 prompt designs with batch sizes on 11,647 utterances from healthcare simulation debriefing
  • Larger batch sizes improved processing speed and reduced energy consumption but reduced coding accuracy
  • Balances coding performance, processing time, and environmental impact for real-time training feedback

Why It Matters

Enables real-time, scalable dialogue analysis in healthcare training, balancing speed, accuracy, and sustainability.