Research & Papers

Auditing Preferences for Brands and Cultures in LLMs

New framework tests 2,000+ queries across 10 topics, finding systematic geographic favoritism in major LLMs.

Deep Dive

A team of researchers led by Jasmine Rienecker has introduced ChoiceEval, a novel framework designed to audit the brand and cultural preferences embedded in large language models (LLMs). The system addresses two key technical challenges: generating realistic, persona-diverse evaluation queries (e.g., for a budget-conscious traveler or a wellness-focused shopper) and converting free-form AI outputs into comparable choice sets and quantitative metrics. This creates a scalable audit pipeline that links model behavior directly to potential real-world economic outcomes like market fairness and competition.

Applying ChoiceEval to major models—including Google's Gemini, OpenAI's GPT, and China's DeepSeek—across 10 topics spanning commerce and culture revealed systematic geographic biases. The study, involving over 2,000 questions, found that U.S.-developed models (Gemini and GPT) showed marked favoritism toward American brands and entities. In contrast, DeepSeek, developed in China, exhibited more balanced preferences but still displayed detectable geographic leanings. These patterns persisted consistently across different user personas, indicating the biases are embedded in the models' training and outputs rather than being incidental.

The findings underscore a critical issue: as LLMs increasingly mediate what billions of people see, choose, and buy, their inherent preferences can shape markets and limit exposure to diverse information. ChoiceEval provides researchers, platforms, and regulators with a concrete methodology to measure these effects, moving beyond anecdotal evidence to reproducible, data-driven audits. This work highlights the urgent need for transparency and mitigation strategies in AI-driven market intermediation.

Key Points
  • ChoiceEval framework audits LLMs across 10 topics using 2,000+ queries with diverse user personas.
  • U.S.-models Gemini and GPT show strong favoritism for American entities; China's DeepSeek is more balanced.
  • Biases are systematic across personas, linking model outputs to real-world market and cultural impacts.

Why It Matters

As LLMs guide consumer choices, quantifying their built-in biases is crucial for fair markets and informed decisions.