Strange Evals - Benching Benchmarks
Education

Strange Evals - Benching Benchmarks

Fr., 19. Juni
03:3005:00
Kostenlos · Website ansehen
Über die Veranstaltung

Join us for an engaging paper reading club where we delve into the intricate world of benchmarks!


Each session features an exploration of widely cited benchmarks, helping us to build a robust intuition about the landscape we’re navigating. We'll be dissecting papers as well as examining raw benchmark samples—surprisingly, many of them don't make a lot of sense!

This week’s discussion is especially intriguing: instead of zeroing in on a single benchmark, we’ll explore the question of whether benchmark scores actually predict how models are used in the real world.

Presenter: Jake Boggs himself


Pre-reading: Can We Predict Model Usage from Benchmarks?


A special shoutout to HUD for hosting us! Check them out at HUD.ai 🏢

Please note that attendance will be limited to keep the discussion focused.

Diese Woche in Sverige