Reinforcement Learning for Predict+Optimize
Predict+Optimize (P+O) is a machine learning framework for optimization problems with unknown parameters. This paper presents a framework to tackle P+O problems using neural networks and reinforcement learning. We focus on the traveling salesman problem and train a recurrent neural network that, given a directed graph, predicts a distribution over different edges permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method.
PDF AbstractDatasets
Add Datasets
introduced or used in this paper
Results from the Paper
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.
Methods
No methods listed for this paper. Add
relevant methods here