AI Safety

Post-mortem'ing my earliest ML research paper, 7 years later

LessWrong AI April 18, 2026

⚡Researcher re-examines 2019 'Assistive Bandits' paper that tackled human preference learning before GPT-2.

Deep Dive

LawrenceC, a researcher, has published a detailed retrospective on LessWrong analyzing his earliest machine learning paper from 2019, 'The Assistive Multi-Armed Bandit.' The paper, published on arXiv just before GPT-2 was announced, studied a simplified version of Cooperative Inverse Reinforcement Learning (CIRL). Its core premise was novel: exploring how an AI agent (a 'robot') should assist a human partner who does not fully know their own reward function or preferences, a significant departure from standard IRL assumptions.

The post-mortem provides crucial historical context, situating the work within the AI alignment research landscape of the mid-2010s. At that time, a prominent proposed alignment strategy involved using Inverse Reinforcement Learning (IRL) on large datasets of human behavior to infer a reward function. LawrenceC's work, conducted during a 2017 internship at UC Berkeley's CHAI with researcher Dylan Hadfield-Menell, aimed to address a key flaw in CIRL models: the unrealistic assumption that humans know their own values perfectly. Instead, it modeled a collaborative process where both the human and AI learn about the human's preferences through interaction.

This retrospective is framed by LessWrong's community practice of reviewing content a full year after publication to assess lasting value beyond initial hype. LawrenceC argues that such delayed analysis avoids focusing solely on immediate reception and allows for higher-level judgment on research direction. The analysis covers the project's timeline from inception to publication and offers thoughts on the paper's standing today, with a promised follow-up to review the technical contents. It serves as a case study in evaluating the trajectory of AI safety research ideas over time.

Key Points

The paper 'The Assistive Multi-Armed Bandit' explored CIRL with humans who don't know their own reward function, a novel twist in 2019.
The research was conducted in 2017 at UC Berkeley's CHAI, predating the transformer revolution exemplified by GPT-2.
The post-mortem uses LessWrong's delayed-review model to assess the paper's long-term value beyond initial project management critiques.

Why It Matters

Offers a rare long-term lens on early AI alignment research, showing how foundational ideas evolved before the LLM era.

Read Original Article

Post-mortem'ing my earliest ML research paper, 7 years later

Why It Matters

Stay Ahead in AI