Search Results for author: Oleg Klimov

Found 6 papers, 5 papers with code

Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft

no code implementations • 28 Jun 2021 • Ingmar Kanitscheider, Joost Huizinga, David Farhi, William Hebgen Guss, Brandon Houghton, Raul Sampedro, Peter Zhokhov, Bowen Baker, Adrien Ecoffet, Jie Tang, Oleg Klimov, Jeff Clune

An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Phasic Policy Gradient

3 code implementations • 9 Sep 2020 • Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.

Ranked #1 on Reinforcement Learning (RL) on ProcGen (using extra training data)

Reinforcement Learning (RL)

2,622

Paper
Code

Quantifying Generalization in Reinforcement Learning

1 code implementation • 6 Dec 2018 • Karl Cobbe, Oleg Klimov, Chris Hesse, Tae-hoon Kim, John Schulman

In this paper, we investigate the problem of overfitting in deep reinforcement learning.

Data Augmentation L2 Regularization +2

381

Paper
Code

Exploration by Random Network Distillation

21 code implementations • ICLR 2019 • Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov

In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods.

Ranked #1 on Unsupervised Reinforcement Learning on URLB (states, 2*10^6 frames)

Montezuma's Revenge reinforcement-learning +2

2,622

Paper
Code

Gotta Learn Fast: A New Benchmark for Generalization in RL

3 code implementations • 10 Apr 2018 • Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, John Schulman

In this report, we present a new reinforcement learning (RL) benchmark based on the Sonic the Hedgehog (TM) video game franchise.

Few-Shot Learning reinforcement-learning +2

Paper
Code

Proximal Policy Optimization Algorithms

170 code implementations • 20 Jul 2017 • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

Ranked #2 on Neural Architecture Search on NATS-Bench Topology, CIFAR-100

Continuous Control Dota 2 +3

49,074

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.