Search Results for author: Cassidy Laidlaw

Found 12 papers, 8 papers with code

Preventing Reward Hacking with Occupancy Measure Regularization

1 code implementation • 5 Mar 2024 • Cassidy Laidlaw, Shivam Singhal, Anca Dragan

Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking.

Paper
Code

Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping

no code implementations • 15 Dec 2023 • Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez

Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation • 13 Dec 2023 • Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.

Reinforcement Learning (RL)

Paper
Code

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

1 code implementation • 13 Dec 2023 • Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count.

Paper
Code

Bridging RL Theory and Practice with the Effective Horizon

1 code implementation • NeurIPS 2023 • Cassidy Laidlaw, Stuart Russell, Anca Dragan

Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.

Reinforcement Learning (RL)

Paper
Code

The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

1 code implementation • ICLR 2022 • Cassidy Laidlaw, Anca Dragan

However, these models fail when humans exhibit systematic suboptimality, i. e. when their deviations from optimal behavior are not independent, but instead consistent over time.

Bayesian Inference Imitation Learning

Paper
Code

Uncertain Decisions Facilitate Better Preference Learning

no code implementations • NeurIPS 2021 • Cassidy Laidlaw, Stuart Russell

We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision.

Paper
Add Code

Perceptual Adversarial Robustness: Generalizable Defenses Against Unforeseen Threat Models

no code implementations • ICLR 2021 • Cassidy Laidlaw, Sahil Singla, Soheil Feizi

We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images.

Adversarial Defense Adversarial Robustness +1

Paper
Add Code

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

2 code implementations • 22 Jun 2020 • Cassidy Laidlaw, Sahil Singla, Soheil Feizi

Adversarial Defense Adversarial Robustness +1

Paper
Code

Playing it Safe: Adversarial Robustness with an Abstain Option

no code implementations • 25 Nov 2019 • Cassidy Laidlaw, Soheil Feizi

We explore adversarial robustness in the setting in which it is acceptable for a classifier to abstain---that is, output no class---on adversarial examples.

Adversarial Robustness