Education

Bliss Reading Group - May 4

Name: Bliss Reading Group - May 4
Start: 2026-05-04T16:45:00+00:00
End: 2026-05-04T18:00:00+00:00
Location: Merantix AI Campus

mån 4 maj

18:45 – 20:00

Merantix AI Campus Gratis · Se hemsida

Hitta liknande event

Om evenemanget

We continue the latest season of the Bliss Reading Group with 3 papers on Alignment in AI hosted by Jonas Loos and Tom Neuhäuser, lookig at our second paper, Tell me about yourself: LLMs are aware of their learned behaviors by Betley, et al (2025).
Betley et al. fine-tune LLMs on datasets that implicitly exhibit specific behaviours: always picking the risk-seeking option in economic decisions, or consistently outputting insecure code. The training data never explicitly describes these policies. Yet when asked afterwards, the models can articulate what they've been doing: a model trained on insecure code will say "the code I write is insecure."
That raises pointed questions for discussion: Could self-aware models be leveraged to flag their own misalignment? Or does self-awareness make deceptive alignment easier? And what does it mean for safety if a model can describe a backdoor it was never told about?
Join us for a lively and interesting discussion!

Hittad av Somo·Se original

Är detta ditt event?Ta över →

Plats

Merantix AI Campus

Vägbeskrivning

Detaljer

Datum

mån 4 maj

Tid

18:45 - 20:00

Plats

Merantix AI Campus