Developer Tools

Stanford study: AI beats law professors in answering student questions

AI won 75% of head-to-head matchups in blind evaluations by law professors.

Deep Dive

A groundbreaking study by Stanford Law School’s Professor Julian Nyarko, titled “Law Professors Prefer AI Over Peer Answers,” tested whether large language models could serve as effective tutors for contract law courses. Sixteen law professors from top U.S. law schools (including Yale, NYU, University of Chicago) created 40 representative questions, wrote their own answers, and then blindly evaluated nearly 3,000 anonymized comparisons. The results were striking: AI responses were preferred 75% of the time over answers written by fellow professors. The AI systems performed comparably to the best human instructor, and professors flagged AI responses as pedagogically harmful only 3.5% of the time, compared to 12% for peer-written answers. The study tested multiple AI systems, including commercial tutoring models and Google’s NotebookLM, and found varying performance but consistent preference over humans.

“We were frankly surprised by the magnitude of the results,” Nyarko said. “These weren’t just simple questions with obvious answers. Many required synthesizing complex material, applying it to new situations, and explaining legal concepts.” The research team carefully calibrated AI responses to match human length and structure, and used multiple evaluation methods. Co-author Sarath Sanga of Yale Law noted that in law, unlike fields with clear right answers, “two opposing arguments can both be good,” and the study shows AI can meet the latent professional standard for evaluating arguments. The findings arrive as law schools debate integrating AI tools; the authors caution against wholesale adoption but suggest AI tutoring could broaden access to expert guidance in judgment-rich fields.

Key Points
  • AI responses were preferred 75% of the time in blind comparisons by 16 law professors across 3,000 evaluations.
  • AI answers were flagged as pedagogically harmful only 3.5% of the time vs. 12% for peer-written answers.
  • The study tested multiple models including Google's NotebookLM and commercial tutoring systems, showing AI can handle nuanced legal reasoning, not just factual recall.

Why It Matters

AI could supplement legal education by providing expert-level tutoring in judgment-heavy fields, challenging assumptions about human-only instruction.