Reading Group
Education

Reading Group

Mon, Jun 22
05:00 AM06:30 AM
500 Washington StFree · See website
Get directions
About the event

Join us for an engaging discussion centered around the paper Pre-Training Under Infinite Compute by Kim et al. (2025). 🎉

🔍 Summary: As compute capacity increases exponentially (4x yearly), but data grows much slower (1.03x), we’ll explore how to effectively train models under these constraints. This study limits itself to a 200M token corpus and makes several findings:

  1. Overfitting occurs with scaling epochs or parameters.
  2. A heavy regularization approach (30x more weight decay than standard practices) proves optimal.
  3. Ensembling models and employing averaging or distillation are effective scaling tactics.

💡 Seed Questions to Ponder:

  • Could synthetic data generation also scale with compute? If so, does the paper's argument fail?
  • What would a paper on "post-training under infinite compute" address? What experiments would you propose?
  • Given that reinforcement learning is limited by feedback and verification, does that shift the conversation?
  • Do the findings about distilling from ensemble models suggest that pre-training is inefficient?

👉 Location: 500 Washington St
👉 Secure your spot: Get your ticket here!

Location

500 Washington St

Get directions

This week in San Francisco

More events in San Francisco

See website