
Education
Reading Group (+๐ง): JUDGEMENTBENCH: Comparing Rubric and Preference Evaluation for Quality Assessment
About the event
Join the Snorkel AI Reading Group, a dynamic forum dedicated to exploring groundbreaking advancements in AI while fostering meaningful connections in our community. ๐ค
In this insightful afternoon session, Russell Yang, an AI Engineering Fellow at Stanford Law School, will present his recent research paper: JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment.
Agenda:
- 3pm - Doors open
- 3:30pm - Talk begins
๐ง Enjoy Boba tea and other refreshments while you learn! ๐ง๐ง๐ง
Key Takeaways:
- What is JudgmentBench? A unique dataset comprising 30 real-world legal tasks with 1,539 rubric scores and 1,530 pairwise preference judgments, sourced from practicing attorneys including those from major U.S. law firms.
- Learn why this is the first public dataset in a specialized domain where both supervision signals are gathered from the same experts on identical items.
- Explore the often-unjustified choice between rubric scoring and comparative judgment, despite their dominance in current benchmarking.
- Discover how comparative judgments significantly outperform rubrics in quality ordering, featuring a mean Spearman correlation of 0.908 vs. 0.150, while also requiring less than half the annotation time.
- Understand how this pattern holds true for both human annotators and LLM autograders.
- Delve into the broader research agenda opened by this paired dataset on how expert judgment should be effectively elicited, aggregated, and utilized in fields lacking verifiable ground truth.
JudgmentBench is a collaborative effort among Stanford, Harvey AI, and Snorkel AI.
๐ Location: 101 Second Street
๐ Reserve Your Spot!
Similar events
Location
101 Second Street
Get directions








