Search Results for author: Anca Dragan

Found 48 papers, 19 papers with code

Inferring Reward Functions from Demonstrators with Unknown Biases

no code implementations • ICLR 2019 • Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca Dragan

Our goal is to infer reward functions from demonstrations.

Paper
Add Code

CoS: Enhancing Personalization and Mitigating Bias with Context Steering

no code implementations • 2 May 2024 • Jerry Zhi-Yang He, Sashrika Pandey, Mariah L. Schrum, Anca Dragan

Proper usage of the context enables the LLM to generate personalized responses, whereas inappropriate contextual influence can lead to stereotypical and potentially harmful generations (e. g. associating "female" with "housekeeper").

Bayesian Inference Language Modelling +1

Paper
Add Code

Evaluating Frontier Models for Dangerous Capabilities

no code implementations • 20 Mar 2024 • Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane

To understand the risks posed by a new AI system, we must understand what it can and cannot do.

Paper
Add Code

A Generalized Acquisition Function for Preference-based Reward Learning

no code implementations • 9 Mar 2024 • Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca Dragan, Erdem Biyik

Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.

Paper
Add Code

Preventing Reward Hacking with Occupancy Measure Regularization

1 code implementation • 5 Mar 2024 • Cassidy Laidlaw, Shivam Singhal, Anca Dragan

Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking.

Paper
Code

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

no code implementations • 27 Feb 2024 • Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.

Paper
Add Code

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation • 13 Dec 2023 • Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.

Reinforcement Learning (RL)

Paper
Code

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

no code implementations • 9 Nov 2023 • Joey Hong, Sergey Levine, Anca Dragan

LLMs trained with supervised fine-tuning or "single-step" RL, as with standard RLHF, might struggle which tasks that require such goal-directed behavior, since they are not trained to optimize for overall conversational outcomes after multiple turns of interaction.

Text Generation

Paper
Add Code

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

no code implementations • 31 Oct 2023 • Joey Hong, Anca Dragan, Sergey Levine

Theoretically, we show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity, in accordance with the above intuition.

Autonomous Navigation Offline RL +1

Paper
Add Code

Managing AI Risks in an Era of Rapid Progress

no code implementations • 26 Oct 2023 • Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila Mcilraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

In this short consensus paper, we outline risks from upcoming, advanced AI systems.

Paper
Add Code

Learning Optimal Advantage from Preferences and Mistaking it for Reward

1 code implementation • 3 Oct 2023 • W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.

Paper
Code

Learning to Model the World with Language

no code implementations • 31 Jul 2023 • Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan

To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them.

Future prediction General Knowledge +1

Paper
Add Code

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.

reinforcement-learning

Paper
Add Code

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

no code implementations • 30 Jun 2023 • Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.

Instruction Following

Paper
Add Code

Toward Grounded Commonsense Reasoning

no code implementations • 14 Jun 2023 • Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh

We additionally illustrate our approach with a robot on 2 carefully designed surfaces.

Language Modelling

Paper
Add Code

Bridging RL Theory and Practice with the Effective Horizon

1 code implementation • NeurIPS 2023 • Cassidy Laidlaw, Stuart Russell, Anca Dragan

Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.

Reinforcement Learning (RL)

Paper
Code

Automatically Auditing Large Language Models via Discrete Optimization

1 code implementation • 8 Mar 2023 • Erik Jones, Anca Dragan, aditi raghunathan, Jacob Steinhardt

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging.

Paper
Code

Towards Modeling and Influencing the Dynamics of Human Learning

no code implementations • 2 Jan 2023 • Ran Tian, Masayoshi Tomizuka, Anca Dragan, Andrea Bajcsy

Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change.

Paper
Add Code

On the Sensitivity of Reward Inference to Misspecified Human Models

no code implementations • 9 Dec 2022 • Joey Hong, Kush Bhatia, Anca Dragan

This begs the question: how accurate do these models need to be in order for the reward inference to be accurate?

Continuous Control

Paper
Add Code

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

no code implementations • 30 Nov 2022 • David Zhang, Micah Carroll, Andreea Bobu, Anca Dragan

One of the most successful paradigms for reward learning uses human feedback in the form of comparisons.

Dimensionality Reduction

Paper
Add Code

UniMASK: Unified Inference in Sequential Decision Problems

1 code implementation • 20 Nov 2022 • Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.

Decision Making

Paper
Code

Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration

no code implementations • 3 Nov 2022 • Mesut Yang, Micah Carroll, Anca Dragan

We show that using optimal behavior as a prior for human models makes these models vastly more data-efficient and able to generalize to new environments.

Paper
Add Code

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

no code implementations • 28 Apr 2022 • Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.

Decision Making Offline RL

Paper
Add Code

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

no code implementations • 25 Apr 2022 • Micah Carroll, Anca Dragan, Stuart Russell, Dylan Hadfield-Menell

These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe".

Recommendation Systems

Paper
Add Code

The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

1 code implementation • ICLR 2022 • Cassidy Laidlaw, Anca Dragan

However, these models fail when humans exhibit systematic suboptimality, i. e. when their deviations from optimal behavior are not independent, but instead consistent over time.

Bayesian Inference Imitation Learning

Paper
Code

Inferring Rewards from Language in Context

1 code implementation • ACL 2022 • Jessy Lin, Daniel Fried, Dan Klein, Anca Dragan

In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e. g., selecting that flight).

Instruction Following Reinforcement Learning (RL)

Paper
Code

Human irrationality: both bad and good for reward inference

no code implementations • 12 Nov 2021 • Lawrence Chan, Andrew Critch, Anca Dragan

More importantly, we show that an irrational human, when correctly modelled, can communicate more information about the reward than a perfectly rational human can.

Paper
Add Code

B-Pref: Benchmarking Preference-Based Reinforcement Learning

1 code implementation • 4 Nov 2021 • Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel

However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark.

Benchmarking reinforcement-learning +1

101

Paper
Code

The MineRL BASALT Competition on Learning from Human Feedback

no code implementations • 5 Jul 2021 • Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.

Imitation Learning

Paper
Add Code

Learning What To Do by Simulating the Past

1 code implementation • ICLR 2021 • David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.

Paper
Code

Choice Set Misspecification in Reward Inference

no code implementations • 19 Jan 2021 • Rachel Freedman, Rohin Shah, Anca Dragan

A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback, like demonstrations or corrections.

Paper
Add Code

The impacts of known and unknown demonstrator irrationality on reward inference

no code implementations • 1 Jan 2021 • Lawrence Chan, Andrew Critch, Anca Dragan

Surprisingly, we find that if we give the learner access to the correct model of the demonstrator's irrationality, these irrationalities can actually help reward inference.

Paper
Add Code

XT2: Training an X-to-Text Typing Interface with Online Learning from Implicit Feedback

no code implementations • ICLR 2021 • Jensen Gao, Siddharth Reddy, Glen Berseth, Nicholas Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca Dragan, Sergey Levine

In the typing domain, we leverage backspaces as implicit feedback that the interface did not perform the desired action.

Paper
Add Code

Benefits of Assistance over Reward Learning

no code implementations • 1 Jan 2021 • Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Paper
Add Code

AvE: Assistance via Empowerment

1 code implementation • NeurIPS 2020 • Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s).

Paper
Code

On the Utility of Learning about Humans for Human-AI Coordination

2 code implementations • NeurIPS 2019 • Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.

636

Paper
Code

Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning

no code implementations • ICLR 2019 • Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn

A significant challenge for the practical application of reinforcement learning toreal world problems is the need to specify an oracle reward function that correctly defines a task.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Preferences Implicit in the State of the World

1 code implementation • ICLR 2019 • Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized.

Reinforcement Learning (RL)

Paper
Code

The Assistive Multi-Armed Bandit

1 code implementation • 24 Jan 2019 • Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science.

Multi-Armed Bandits

Paper
Code

On the Utility of Model Learning in HRI

no code implementations • 4 Jan 2019 • Gokul Swamy, Jens Schulz, Rohan Choudhury, Dylan Hadfield-Menell, Anca Dragan

Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly?

Autonomous Driving

Paper
Add Code

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

no code implementations • 31 May 2018 • Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn

A significant challenge for the practical application of reinforcement learning in the real world is the need to specify an oracle reward function that correctly defines a task.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Inverse Reward Design

1 code implementation • NeurIPS 2017 • Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan

When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios.

Paper
Code

Should Robots be Obedient?

1 code implementation • 28 May 2017 • Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell

We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human's underlying preferences can always perform better than a robot that simply follows the human's literal order.

Paper
Code

Translating Neuralese

1 code implementation • ACL 2017 • Jacob Andreas, Anca Dragan, Dan Klein

Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel.

Machine Translation Translation

Paper
Code

DART: Noise Injection for Robust Imitation Learning

2 code implementations • 27 Mar 2017 • Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg

One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy.

Imitation Learning

Paper
Code

The Off-Switch Game

no code implementations • 24 Nov 2016 • Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch.

Paper
Add Code

Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations

no code implementations • 4 Oct 2016 • Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg

Although policies learned with RC sampling can be superior to HC sampling for standard learning models such as linear SVMs, policies learned with HC sampling may be comparable with highly-expressive learning models such as deep learning and hyper-parametric decision trees, which have little model error.

Paper
Add Code

Cooperative Inverse Reinforcement Learning

2 code implementations • NeurIPS 2016 • Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans.

Active Learning reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.