PQR framework finds 78% more QA agent failures with realistic queries
New method uncovers hidden failures triggered by real user intents, not just adversarial attacks
PQR (Prompt-Query Refinement) is a novel framework from researchers at Columbia University and others that systematically generates realistic user queries designed to trigger failures in LLM-based QA agents. Unlike prior work focusing on adversarial or unnatural inputs, PQR focuses on queries that resemble real user intents while still causing agents to produce unhelpful, unsafe, or otherwise objective-violating responses. The framework operates via two complementary modules: a query refinement module that rewrites queries to explore diverse variations, and a prompt refinement module that uses feedback from past iterations to derive new strategies for violating objectives (e.g., helpfulness, safety) and policies for maintaining realism.
In experiments on an e-commerce QA agent, PQR uncovered 23%–78% more unhelpful responses compared to baseline failure-finding methods. The generated queries were also rated as more diverse and realistic by human evaluators. This work addresses a critical gap in LLM evaluation: most automated testing relies on adversarial attacks, which miss failures that occur when real users ask innocuous but tricky questions. PQR’s iterative, objective-driven approach could be applied to any QA agent, helping developers proactively find and fix failures before deployment.
- PQR combines query refinement and prompt refinement modules to iteratively generate failure-triggering queries
- Detects 23%–78% more unhelpful responses than previous methods on e-commerce QA agents
- Focuses on realistic user intents rather than adversarial attacks, increasing real-world relevance
Why It Matters
Enables automated, realistic stress-testing of QA agents, reducing human effort to find critical failures before deployment.