Research & Papers

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA

arXiv cs.CL April 08, 2026

⚡A new study shows AI models change their medical conclusions based on subtle wording, even with the same evidence.

Deep Dive

A new study from researchers at UT Austin and other institutions reveals a critical flaw in how large language models (LLMs) handle medical questions. The team systematically tested eight LLMs—including models like GPT-4 and Claude—on a dataset of 6,614 query pairs grounded in clinical trial abstracts. They found that simply changing the framing of a patient's question from positive ('This treatment works, right?') to negative ('This treatment doesn't work, does it?') significantly increased the likelihood of the AI model providing a contradictory medical conclusion, even when its answers were supposed to be based on the same underlying evidence.

The research was conducted in a controlled retrieval-augmented generation (RAG) setting, where models were provided with expert-selected documents rather than retrieving their own, isolating the influence of query phrasing. The 'framing effect' was not mitigated by changing the language style from technical to plain. Alarmingly, the inconsistency was amplified in multi-turn conversations, where sustained persuasive phrasing led models further astray. This demonstrates that current LLMs lack the robustness needed for high-stakes medical advice, as their outputs can be systematically manipulated by how a vulnerable or confused patient words their question.

Key Points

Tested 8 LLMs on 6,614 medical query pairs in a controlled RAG setting with clinical evidence.
Positively- vs. negatively-framed questions led to significantly more contradictory conclusions than same-framing pairs.
The framing effect was amplified in multi-turn conversations and was not mitigated by language style (technical vs. plain).

Why It Matters

For AI health tools, this is a major safety issue—a patient's confused phrasing shouldn't change the medical advice.

Read Original Article

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA

Why It Matters

Stay Ahead in AI