Research & Papers

CyberJurors multi-agent AI beats LLMs in e-commerce dispute verdicts

New framework uses 6,000 real-world cases to simulate jury deliberation.

Deep Dive

Researchers built CyberJurors, a multi-agent simulation for e-commerce dispute verdicts. It uses VerdictBench, a benchmark of 6,000 real transaction disputes, and a two-level reasoning process: Individual Verdict Chain-of-Thought (4-stage analysis) and Jury Consensus Verdict (multi-round voting with precedents). CyberJurors outperforms state-of-the-art LLMs, MLLMs, and court simulators, better matching human jury decisions.

Key Points
  • VerdictBench contains 6,000 real e-commerce dispute cases with multimodal evidence.
  • CyberJurors uses Individual Verdict Chain-of-Thought (4 reasoning stages) and Jury Consensus Voting with precedents.
  • Outperforms GPT-4, Gemini, and other LLMs/MLLMs in aligning with human jury decisions.

Why It Matters

This could automate fair dispute resolution for millions of daily e-commerce transactions.