no code implementations • ICML 2020 • Adam Stooke, Joshua Achiam, Pieter Abbeel
This intuition leads to our introduction of PID control for the Lagrange multiplier in constrained RL, which we cast as a dynamical system.
no code implementations • 25 Jul 2022 • Heidy Khlaaf, Pamela Mishkin, Joshua Achiam, Gretchen Krueger, Miles Brundage
Codex, a large language model (LLM) trained on a variety of codebases, exceeds the previous state of the art in its capacity to synthesize and generate code.
no code implementations • 8 Jul 2020 • Adam Stooke, Joshua Achiam, Pieter Abbeel
Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training.
no code implementations • 21 Mar 2019 • Joshua Achiam, Ethan Knight, Pieter Abbeel
Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation.
no code implementations • 26 Jul 2018 • Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel
We explore methods for option discovery based on variational inference and make two algorithmic contributions.
13 code implementations • 8 Mar 2018 • Alex Nichol, Joshua Achiam, John Schulman
This paper considers meta-learning problems, where there is a distribution of tasks, and we would like to obtain an agent that performs well (i. e., learns quickly) when presented with a previously unseen task sampled from this distribution.
9 code implementations • ICML 2017 • Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function.
no code implementations • 6 Mar 2017 • Joshua Achiam, Shankar Sastry
Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards.
no code implementations • 29 Feb 2016 • Joshua Achiam
A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or $Q$-function may fail to improve performance---or worse, actually cause the policy performance to degrade.