Research & Papers

PrefBench reveals LLMs excel at deal rates but fail at pricing profit

New benchmark shows AI agents close 99% of deals but profit barely above random.

Deep Dive

A new research paper from Yingjie Lei presents PrefBench, a benchmark designed to evaluate zero-shot LLM agents in personalized pricing negotiations where buyer preferences—such as valuation, patience, and walkaway thresholds—are hidden from the seller. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle info, and negotiation history, but must infer latent traits. The benchmark uses a strict JSON action protocol to maintain a hidden-information boundary, ensuring agents comply with structured output requirements.

Across 7,500 episodes, tested LLMs reliably followed the protocol and closed deals at rates above 0.99. However, their profit performance was dismal: the best LLM average profit only marginally beat a random baseline and was far outperformed by a simple concession heuristic. This reveals that while LLMs can simulate agreement-seeking behavior, they lack profit-sensitive bargaining instincts. PrefBench provides a controlled environment for diagnosing these weaknesses and driving future improvements in AI-driven negotiation systems.

Key Points
  • PrefBench tests LLM sellers in 7,500 episodes with hidden buyer preferences (valuation, patience, walkaway).
  • LLMs achieved deal rates >0.99 but profit barely above random baseline, far below a simple heuristic.
  • Structured action compliance and high agreement rates coexist with weak profit optimization.

Why It Matters

Highlights gap between AI negotiation compliance and actual profit optimization for enterprise pricing.