Gemini 2.5 Dominates law Q&A with 75% win rate

According to @emollick, Stanford found Gemini 2.5 beat professors 75%, was rated less harmful, and newer models perform even better.

Source

Analysis

In a Stanford study highlighted by Ethan Mollick, law professors submitted real questions from their office hours, Gemini 2.5 Pro and human professors provided answers, and other law professors evaluated them blindly without knowing the source. Gemini 2.5 achieved a 75 percent win rate over human responses while producing answers rated less harmful than those from professors according to the study shared on June 2 2026.

Key takeaways

Gemini 2.5 Pro outperformed human law professors with a 75 percent blind preference rate in answering authentic office-hour questions.
AI responses were judged less harmful than human answers, highlighting improved safety in professional legal contexts.
Newer AI models continue to show gains, suggesting rapid progress in domain-specific legal assistance capabilities.

Deep dive into the Stanford study results

The experiment focused on practical queries that students pose during office hours rather than abstract legal theory. Blind judging by independent law professors eliminated bias and revealed consistent preference for the AI outputs. This outcome demonstrates how frontier models like Gemini 2.5 Pro now handle nuanced legal reasoning with greater clarity and lower risk of misleading advice.

Technical factors driving performance

Advanced reasoning chains and safety alignments in Gemini 2.5 likely contributed to the reduced harm ratings. The model appears to avoid overconfident statements that sometimes appear in human responses under time pressure.

Business impact and opportunities

Law firms and legal education platforms can integrate similar AI systems to handle routine student or client queries, freeing professors and associates for complex matters. Monetization strategies include subscription-based AI tutoring tools for law schools and enterprise licenses that provide audited, low-harm responses. Implementation requires fine-tuning on institutional case law and ongoing human oversight to maintain compliance with bar association guidelines.

Future outlook

As newer models surpass current benchmarks, legal services may shift toward hybrid human-AI workflows where AI drafts initial guidance and experts review high-stakes elements. This evolution could expand access to quality legal education while raising regulatory questions around AI-generated advice and professional liability standards.

Frequently Asked Questions

What was the win rate of Gemini 2.5 in the study?

Gemini 2.5 achieved a 75 percent win rate against human professors in blind evaluations.

Were AI answers rated safer than human ones?

Yes, Gemini responses received lower harm ratings than those written by law professors.

Can law schools use this AI for student support?

Yes, with proper fine-tuning and oversight, AI can assist with routine office-hour style questions effectively.

Gemini 2.5 Google LLM Stanford

Ethan Mollick

@emollick

Professor @Wharton studying AI, innovation & startups. Democratizing education using tech