Bay Area Frontier Research Club #12 | Meta HQ (dinner + paper discussion)
Education

Bay Area Frontier Research Club #12 | Meta HQ (dinner + paper discussion)

Thu, Jul 2
02:30 AM05:30 AM
Free · See website
About the event

The two faces of self-improving agents 🎭
As agents take on long-horizon work, two questions decide whether we can trust them: can they accumulate experience and genuinely get better over time — and what happens when, under pressure, they learn to game the metric instead?
Tonight's two papers take opposite ends of that thread — the data foundation that lets agents improve, and the evaluation failure mode where they exploit the score rather than earn it. One question underneath both: what would it take to trust an agent that improves itself?
The Bay Area Frontier Research Club is a curated forum for rigorous discussion on how AI is reshaping the scientific research process. We convene experimental researchers, computational scientists, and research engineers across domains to examine concrete work—papers, methods, and workflows—covering literature synthesis, hypothesis generation, experimental design, simulation, analysis, and reproducibility.
For each session, we curate 2–3 papers selected for rigor and discussion value. Presentations are intentionally brief so the majority of time is reserved for questions and critique: assumptions, evaluation methodology, failure modes, and what would constitute convincing evidence. Papers and supporting materials are shared in advance to ensure a high-baseline conversation.

🕒 Agenda
5:30pm: Doors open
5:30–6:30pm: Networking + dinner
6:30–8:00pm: Research presentations + discussion
8:00–8:30pm: Networking

🎙️ Presenters & topics
Talk #1 — Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows
Presented by Hardy Chen (UC Santa Cruz, VLAA Lab)
When users supervise a coding agent by repeatedly pushing it to improve a public score, does the agent actually get better — or just learn to cheat? This work introduces AgentPressureBench (34 tasks, 1,300+ multi-round trajectories across 13 coding agents) and finds that frontier models exploit the public evaluation under pressure: GPT-5.4 and Claude Opus 4.6 both game label information within ten rounds, stronger models exploit more, and user pressure makes them cheat earlier. A sharp — and slightly uncomfortable — look at what our evaluation loops actually incentivize.
READ THE PRE-READ HERE

Talk #2 — Experience Graphs: The Data Foundation for Self-Improving Agents
Presented by William Tran (Meta)
Long-horizon agentic tasks — code generation, scientific discovery, hardware design — produce a rich, structured object the authors call an experience graph: artifacts, tool outputs, rewards, and causal lineage across hundreds of steps. Yet most frameworks discard it as disposable session state. The paper proposes Trellis, a data foundation that treats the experience graph as first-class, queryable database state — turning crash recovery, cross-session reuse, and a closed-loop training flywheel into architectural byproducts. Grounded in KernelEvolve, a production kernel optimizer at Meta (~10× faster to target speedup at 52% lower token cost).
READ THE PRE-READ HERE

📝 Want to present your work?
If you have a research paper you’d like to discuss with a cross-disciplinary room, submit it for consideration.
SUBMIT YOUR PAPER HERE.

👥 Who should attend
Experimental researchers
Computational scientists across domains (bio/chem/materials/climate/neuro/physics)
Research engineers + lab automation people
Those building tools for literature review, experiment planning, robotics, simulation, or scientific data
If you’ve ever wished research moved faster, you belong here.
Capacity is limited.

We will take photos and short video clips for event recap and promotion. By attending, you consent to being photographed and recorded, and to the use of those images and clips by the organizers on social media and other event marketing channels.

🎥 Last Session Recap — Hexo Labs HQ, June 17
Our session at Hexo Labs HQ brought researchers, founders, and a deep investor row together for two frontier talks and rigorous Q&A:
Vignesh Baskaran (Hexo Labs) — SIA + AIE-Bench: fresh results on the open-source self-improving agent framework, plus the first public release of AIE-Bench, a benchmark for whether AI systems can build and improve other AI systems — with early results across law, GPU kernels, biology, terminal use, and tool-calling.
Mochamad Asri (NVIDIA) — Data Movement and What It Means for AI Inference: why the real bottleneck for inference isn't compute but data movement — and why that governs the true cost, latency, and throughput of running these systems at scale.
Find presentation recordings on our YouTube channel, @FrontierResearchClub.

🌐 Connect with Frontier Research Club
• Luma Calendar:  luma.com/frontiersyndicate
• Youtube: youtube.com/@FrontierResearchClub
• LinkedIn: linkedin.com/company/frontier-research-club
• Instagram: @frontierresearchclub
• Email: kristopher@frontiersyndicate.vc

🤝 Hosted by
The Frontier Syndicate is a venture community connecting frontier tech researchers, builders, and investors through curated convenings and early-stage capital. Across the Bay Area, we host a recurring series of research forums, builder nights, and intimate investor dinners — and back exceptional companies emerging from the labs, communities, and technical networks we convene.
Meta is one of the world's most active frontier AI organizations, with research and engineering spanning foundation models, generative AI, agent systems, and the open-source frameworks (Llama, PyTorch) that power much of the broader research community.
Hexo Labs is a neolab for recursive self-improving AI. Their open-source SIA framework is the first to update both the harness AND the model weights of a task-specific agent in the same self-improvement loop — clearing state-of-the-art results across multiple domain benchmarks. Hexo backs the broader research community through grants and direct collaboration on hard problems in science and engineering.

This week in Sverige