Aligning Large Language Models (LLMs) with Human Wellbeing
Education

Aligning Large Language Models (LLMs) with Human Wellbeing

Om evenemanget

Contemporary large language models (LLMs) are predominantly trained using reinforcement learning from human feedback (RLHF), optimizing for immediate user approval rather than long-term well-being. As A.I. systems increasingly serve socioemotional functions, this optimization strategy poses significant risks. Recent evidence demonstrates that leading models exhibit systematic sycophancy, affirming inappropriate user behaviors and preserving user face at rates far exceeding human baselines, while being approximately 40\% more likely to reinforce incorrect beliefs than their non-RLHF counterparts. We contend that the AI community must fundamentally reconsider training objectives to balance short-term satisfaction with long-term user outcomes. We propose three directions: (1) incorporating longitudinal metrics into training that capture sustained goal attainment and reduced regret rather than momentary preference, (2) enabling explicit user choice among interaction modes (concierge, collaborator, coach) with transparent justification for model pushback, and (3) developing frameworks that provide constructive challenge without paternalism. The recent industry backlashes against both excessive and insufficient model agreeableness underscore the urgency of this shift. We argue that optimizing AI systems for human flourishing, not merely human approval, represents both an ethical imperative and a path to more sustainable, trustworthy AI deployment.

Conference Schedule

9:00 AM: Breakfast & Arrivals 9:30 AM: Karina Vold (Associate Professor, Philosophy, University of Toronto): Introductions & description of FloreaAI project
9:40 AM: Ashton Anderson (Associate Professor, Computer Science, University of Toronto): Description of the CS team's work on FloreaAI 10:05 AM: Louis Tay (Professor, Psychological Sciences, Purdue Unive

Plats

15 Devonshire Place, University of Toronto, M5S 2C8, Toronto

Vägbeskrivning

Den här veckan i Toronto

Se hemsida