Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas
A hierarchical system learns multiple evidence-grounded personas from noisy user data...
A team of researchers led by Nayoung Choi from Emory University has introduced a novel hierarchical framework for inducing multiple, evidence-grounded personas from user behavioral logs. The method, detailed in a paper submitted to arXiv, tackles the challenge of noisy and interleaved user data by first aggregating actions into intent memories, then clustering and labeling these memories to generate interpretable natural-language personas. This approach moves beyond traditional methods that focus solely on downstream utility, instead directly optimizing for persona quality.
The framework formulates persona induction as an optimization problem over three key metrics: cluster cohesion, persona-evidence alignment, and persona truthfulness. To train the model, the team uses a groupwise extension of Direct Preference Optimization (DPO). Experiments on a large-scale service log and two public datasets demonstrate that the induced personas are more coherent, evidence-grounded, and trustworthy compared to baselines, while also improving the accuracy of future interaction predictions. This work represents a significant step toward building AI systems that can understand and model user behavior more accurately and transparently.
- Hierarchical framework aggregates user actions into intent memories, then clusters and labels them for multi-persona induction
- Optimizes persona quality via three metrics: cluster cohesion, persona-evidence alignment, and truthfulness
- Uses groupwise Direct Preference Optimization (DPO) for training; outperforms baselines on large-scale logs and public datasets
Why It Matters
Enables AI to build accurate, transparent user profiles from noisy data, improving personalization and trust.