Education

Reading Group

Name: Reading Group
Start: 2026-06-22T03:00:00+00:00
End: 2026-06-22T04:30:00+00:00
Location: 500 Washington St

Mon, Jun 22

05:00 AM – 06:30 AM

500 Washington St Free · See website

Get directions

About the event

Join us for an engaging discussion centered around the paper Pre-Training Under Infinite Compute by Kim et al. (2025). 🎉

🔍 Summary: As compute capacity increases exponentially (4x yearly), but data grows much slower (1.03x), we’ll explore how to effectively train models under these constraints. This study limits itself to a 200M token corpus and makes several findings:

Overfitting occurs with scaling epochs or parameters.
A heavy regularization approach (30x more weight decay than standard practices) proves optimal.
Ensembling models and employing averaging or distillation are effective scaling tactics.

💡 Seed Questions to Ponder:

Could synthetic data generation also scale with compute? If so, does the paper's argument fail?
What would a paper on "post-training under infinite compute" address? What experiments would you propose?
Given that reinforcement learning is limited by feedback and verification, does that shift the conversation?
Do the findings about distilling from ensemble models suggest that pre-training is inefficient?

👉 Location: 500 Washington St
👉 Secure your spot: Get your ticket here!

Source: Lu

Location

500 Washington St — Open in Google Maps

Details

Date

Mon, Jun 22

Time

05:00 AM — 06:30 AM

Location

500 Washington St

Get a push before the event starts