Education

Reading Group (+🧋): JUDGEMENTBENCH: Comparing Rubric and Preference Evaluation for Quality Assessment

Name: Reading Group (+🧋): JUDGEMENTBENCH: Comparing Rubric and Preference Evaluation for Quality Assessment
Start: 2026-06-17T22:00:00+00:00
End: 2026-06-18T00:30:00+00:00
Location: 101 Second Street

tors 18 juni

00:00 – 02:30

101 Second Street Gratis · Se hemsida

Om evenemanget

Join the Snorkel AI Reading Group, a dynamic forum dedicated to exploring groundbreaking advancements in AI while fostering meaningful connections in our community. 🤝

In this insightful afternoon session, Russell Yang, an AI Engineering Fellow at Stanford Law School, will present his recent research paper: JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment.

Agenda:

3pm - Doors open
3:30pm - Talk begins

🧋 Enjoy Boba tea and other refreshments while you learn! 🧋🧋🧋

Key Takeaways:

What is JudgmentBench? A unique dataset comprising 30 real-world legal tasks with 1,539 rubric scores and 1,530 pairwise preference judgments, sourced from practicing attorneys including those from major U.S. law firms.
Learn why this is the first public dataset in a specialized domain where both supervision signals are gathered from the same experts on identical items.
Explore the often-unjustified choice between rubric scoring and comparative judgment, despite their dominance in current benchmarking.
Discover how comparative judgments significantly outperform rubrics in quality ordering, featuring a mean Spearman correlation of 0.908 vs. 0.150, while also requiring less than half the annotation time.
Understand how this pattern holds true for both human annotators and LLM autograders.
Delve into the broader research agenda opened by this paired dataset on how expert judgment should be effectively elicited, aggregated, and utilized in fields lacking verifiable ground truth.