
Education
Reading Group
About the event
Join us for an engaging discussion centered around the paper Pre-Training Under Infinite Compute by Kim et al. (2025). 🎉
🔍 Summary: As compute capacity increases exponentially (4x yearly), but data grows much slower (1.03x), we’ll explore how to effectively train models under these constraints. This study limits itself to a 200M token corpus and makes several findings:
- Overfitting occurs with scaling epochs or parameters.
- A heavy regularization approach (30x more weight decay than standard practices) proves optimal.
- Ensembling models and employing averaging or distillation are effective scaling tactics.
💡 Seed Questions to Ponder:
- Could synthetic data generation also scale with compute? If so, does the paper's argument fail?
- What would a paper on "post-training under infinite compute" address? What experiments would you propose?
- Given that reinforcement learning is limited by feedback and verification, does that shift the conversation?
- Do the findings about distilling from ensemble models suggest that pre-training is inefficient?
👉 Location: 500 Washington St
👉 Secure your spot: Get your ticket here!
Similar events
Location
500 Washington St
Get directions








