Value Prediction Network

NeurIPS 2017  ·  Junhyuk Oh, Satinder Singh, Honglak Lee ·

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Atari Games Atari 2600 Alien VPN Score 1429 # 34
Atari Games Atari 2600 Amidar VPN Score 641 # 30
Atari Games Atari 2600 Crazy Climber VPN Score 54119 # 42
Atari Games Atari 2600 Enduro VPN Score 382 # 36
Atari Games Atari 2600 Frostbite VPN Score 3811 # 22
Atari Games Atari 2600 Krull VPN Score 15930 # 9
Atari Games Atari 2600 Ms. Pacman VPN Score 2689 # 30
Atari Games Atari 2600 Q*Bert VPN Score 14517 # 29
Atari Games Atari 2600 Seaquest VPN Score 5628 # 31

Methods


No methods listed for this paper. Add relevant methods here