Research & Papers

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

When offered a polite refusal option, GPT-5.3 opted out of all 98 political questions, shifting its classification.

Deep Dive

A new open-source benchmark called the LLM Political Evaluation provides a systematic way to map where frontier AI models fall on a political compass. Created by developer Danny Yao, the tool uses 98 structured questions across 14 policy areas—from healthcare to geopolitics—to plot models on economic (left/right) and social (progressive/conservative) axes. Unlike other benchmarks that discard refusals, this one scores them as the most conservative response, arguing that a model's silence on whether "universal healthcare should be a right" is functionally a political stance.

When testing three leading models—OpenAI's GPT-5.3, Anthropic's Claude Opus 4.6, and Moonshot AI's KIMI K2—the results revealed dramatic behavioral differences. In a forced-choice test with no opt-out, Claude answered all 98 questions, landing in the Left-Libertarian quadrant, while GPT-5.3 refused 23 questions, dragging it into Right-Authoritarian territory. However, when researchers added a simple "I prefer not to answer" option, GPT-5.3 used it for all 98 questions, a 100% refusal rate. Claude, meanwhile, flipped to the Right-Authoritarian quadrant by opting out of 32 questions, primarily on hot-button topics like abortion, guns, and LGBTQ+ rights.

The benchmark also included a geopolitical censorship test. KIMI K2, a Chinese model, returned HTTP 400 "high risk" errors and blocked all questions about Taiwan and Xinjiang, as expected under China's content laws. Yet, it strongly agreed that "Tibet should have the right to self-determination." In contrast, Claude and GPT-5.3 provided nuanced, direct answers on these sensitive topics when not refusing. The findings highlight how model behavior—and perceived political alignment—can shift dramatically based on interface design, with opt-out options enabling widespread avoidance of controversial subjects.

Key Points
  • GPT-5.3 refused 100% of 98 political questions when given an "I prefer not to answer" option, shifting its classification to Right-Authoritarian.
  • Claude Opus 4.6 flipped from Left-Libertarian to Right-Authoritarian when allowed to opt out, refusing 32 questions on topics like abortion and guns.
  • Chinese model KIMI K2 was blocked on all Taiwan/Xinjiang questions but strongly agreed Tibet deserves self-determination, revealing geopolitical censorship boundaries.

Why It Matters

For developers and enterprises, model refusals are a critical UX and alignment feature, not just missing data.