ScheduleNet: Learn to Solve MinMax mTSP Using Reinforcement Learning with Delayed Reward

1 Jan 2021 · Junyoung Park, Sanzhar Bakhtiyarov, Jinkyoo Park ·

Combinatorial Optimization (CO) problems are theoretically challenging yet crucial in practice. Numerous works used Reinforcement Learning (RL) to tackle these CO problems. As current approaches mainly focus on single-worker CO problems such as the famous Travelling Salesman Problem (TSP), we focus on more practical extension of TSP to multi-worker (salesmen) setting, specifically MinMax mTSP. From the RL perspective, Minmax mTSP raises several significant challenges, such as the cooperation of multiple workers and the need for a well-engineered reward function. In this paper, we present the RL framework with (1) worker-task heterograph and type-aware Graph Neural Network, and (2) the RL training method that is stable, has fast convergence speed, and directly optimizes the objective of MinMax mTSP in a delayed reward setting. We achieve comparable performance to a highly optimized meta-heuristic baseline, OR-Tools, and outperforms it in 10% of the cases, both on in-training and out-of-training problem distributions. Moreover, our problem formulation enables us to solve problems with any number of salesmen (workers) and cities.

PDF Abstract