no code implementations • 28 Jun 2021 • Ingmar Kanitscheider, Joost Huizinga, David Farhi, William Hebgen Guss, Brandon Houghton, Raul Sampedro, Peter Zhokhov, Bowen Baker, Adrien Ecoffet, Jie Tang, Oleg Klimov, Jeff Clune
An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks.
3 code implementations • 9 Sep 2020 • Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman
We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.
Ranked #1 on Reinforcement Learning (RL) on ProcGen (using extra training data)
1 code implementation • 6 Dec 2018 • Karl Cobbe, Oleg Klimov, Chris Hesse, Tae-hoon Kim, John Schulman
In this paper, we investigate the problem of overfitting in deep reinforcement learning.
21 code implementations • ICLR 2019 • Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov
In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods.
3 code implementations • 10 Apr 2018 • Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, John Schulman
In this report, we present a new reinforcement learning (RL) benchmark based on the Sonic the Hedgehog (TM) video game franchise.
170 code implementations • 20 Jul 2017 • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.