AI Safety

Email in the Era of LLMs

LLM judges rate AI emails 2x more successful than human ones in sensitive workplace scenarios.

Deep Dive

A team from the University of Chicago and the Allen Institute for AI has published a groundbreaking study titled 'Email in the Era of LLMs' on arXiv. To understand how large language models are reshaping professional communication, they created the HR Simulator—a game where players act as HR officers crafting emails for delicate workplace situations. Their analysis of over 600 emails, evaluated by LLMs acting as judges, reveals a stark performance gap: human-written emails achieved only a 23.5% success rate, while emails written by LLMs like GPT-4 scored between 48% and 54%. This suggests AI judges consistently favor AI-generated communication in these nuanced social tasks.

The research uncovers a critical trend: larger, more capable models show homogenized judgment and a clear preference for tact. Weaker models favored direct, less tactful strategies, whereas advanced models like GPT-4 consistently chose more diplomatic approaches. When humans and LLMs collaborated, the results were transformative, boosting success rates from 40% to nearly 100% in one scenario. However, the study also highlights a key limitation: while LLM rewrites successfully made human emails more formal and empathetic, they struggled to replicate the authentic, casual tone of low-empathy, low-formality human writing. This points to a fundamental challenge in current post-training methods for AI.

Key Points
  • LLM judges rated AI-written emails as 2x more successful (48-54%) than human-written ones (23.5%) in sensitive HR scenarios.
  • Human+LLM co-writing proved most effective, boosting success rates from 40% to nearly 100% in collaborative setups.
  • Advanced models like GPT-4 prefer tactful, diplomatic email strategies, while current LLMs cannot authentically replicate casual, low-empathy human tone.

Why It Matters

This research provides the first empirical framework for measuring AI's impact on professional communication, proving human-AI collaboration is the future of effective workplace email.