Research & Papers

Gemini 2.5 Flash beats GPT models at inferring expertise from Slack logs

New study shows AI can guess your skills from chat messages with surprising accuracy

Deep Dive

A new study titled "Can AI Guess What You Know?" by researchers Ko Watanabe and Shoya Ishimaru investigates whether Large Language Models (LLMs) can infer an individual's domain knowledge directly from long-term Slack logs. The team analyzed 27,188 messages from 43 users and evaluated seven models from the Gemini, Claude, and GPT families. They compared the models' zero-shot estimates against self-reported skill ratings from 27 participants, using mean absolute error (MAE) as the metric.

Gemini 2.5 Flash emerged as the top performer with an MAE of 21.13%, while GPT models showed significantly larger discrepancies. Notably, the study found that estimation accuracy depended only weakly on message volume, meaning more text alone doesn't guarantee better inference. These results demonstrate the feasibility and current limits of automated expertise mapping, highlighting the need for privacy-preserving deployments and richer, structure-aware representations of human knowledge. The paper is available on arXiv (arXiv:2605.22971).

Key Points
  • Tested 7 LLMs including Gemini, Claude, and GPT families on 27,188 Slack messages from 43 users
  • Gemini 2.5 Flash achieved lowest error (MAE 21.13%); GPT models showed significantly larger discrepancies
  • Estimation accuracy only weakly correlated with message volume – more text doesn't guarantee better inference

Why It Matters

Could automate expertise discovery in organizations but raises privacy concerns and accuracy limits