1 code implementation • 5 Mar 2024 • Cassidy Laidlaw, Shivam Singhal, Anca Dragan
Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking.
no code implementations • 15 Dec 2023 • Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez
Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems.
1 code implementation • 13 Dec 2023 • Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan
Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.
1 code implementation • 13 Dec 2023 • Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count.
1 code implementation • NeurIPS 2023 • Cassidy Laidlaw, Stuart Russell, Anca Dragan
Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.
1 code implementation • ICLR 2022 • Cassidy Laidlaw, Anca Dragan
However, these models fail when humans exhibit systematic suboptimality, i. e. when their deviations from optimal behavior are not independent, but instead consistent over time.
no code implementations • NeurIPS 2021 • Cassidy Laidlaw, Stuart Russell
We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision.
no code implementations • ICLR 2021 • Cassidy Laidlaw, Sahil Singla, Soheil Feizi
We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images.
2 code implementations • 22 Jun 2020 • Cassidy Laidlaw, Sahil Singla, Soheil Feizi
We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images.
no code implementations • 25 Nov 2019 • Cassidy Laidlaw, Soheil Feizi
We explore adversarial robustness in the setting in which it is acceptable for a classifier to abstain---that is, output no class---on adversarial examples.
1 code implementation • NeurIPS 2019 • Cassidy Laidlaw, Soheil Feizi
For simplicity, we refer to functional adversarial attacks on image colors as ReColorAdv, which is the main focus of our experiments.
1 code implementation • CVPR 2019 • Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black
To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.