Tested 7 LLMs including Gemini, Claude, and GPT families on 27,188 Slack messages from 43 users?

Tested 7 LLMs including Gemini, Claude, and GPT families on 27,188 Slack messages from 43 users

Gemini 2.5 Flash achieved lowest error (MAE 21.13%); GPT models showed significantly larger discrepancies?

Gemini 2.5 Flash achieved lowest error (MAE 21.13%); GPT models showed significantly larger discrepancies

Estimation accuracy only weakly correlated with message volume – more text doesn't guarantee better inference?

Estimation accuracy only weakly correlated with message volume – more text doesn't guarantee better inference

Research & Papers

Gemini 2.5 Flash beats GPT models at inferring expertise from Slack logs

arXiv cs.CL May 25, 2026

⚡New study shows AI can guess your skills from chat messages with surprising accuracy

Deep Dive

A new study titled "Can AI Guess What You Know?" by researchers Ko Watanabe and Shoya Ishimaru investigates whether Large Language Models (LLMs) can infer an individual's domain knowledge directly from long-term Slack logs. The team analyzed 27,188 messages from 43 users and evaluated seven models from the Gemini, Claude, and GPT families. They compared the models' zero-shot estimates against self-reported skill ratings from 27 participants, using mean absolute error (MAE) as the metric.

Gemini 2.5 Flash emerged as the top performer with an MAE of 21.13%, while GPT models showed significantly larger discrepancies. Notably, the study found that estimation accuracy depended only weakly on message volume, meaning more text alone doesn't guarantee better inference. These results demonstrate the feasibility and current limits of automated expertise mapping, highlighting the need for privacy-preserving deployments and richer, structure-aware representations of human knowledge. The paper is available on arXiv (arXiv:2605.22971).

Key Points

Tested 7 LLMs including Gemini, Claude, and GPT families on 27,188 Slack messages from 43 users
Gemini 2.5 Flash achieved lowest error (MAE 21.13%); GPT models showed significantly larger discrepancies
Estimation accuracy only weakly correlated with message volume – more text doesn't guarantee better inference

Why It Matters

Could automate expertise discovery in organizations but raises privacy concerns and accuracy limits

Read Original Article

Gemini 2.5 Flash beats GPT models at inferring expertise from Slack logs

Why It Matters

Related Articles

🚀 Stay Ahead in AI