no code implementations • 2 Dec 2022 • Kaustubh Sridhar, Vikramank Singh, Balakrishnan Narayanaswamy, Abishek Sankararaman
PnC jointly trains a prediction model and a terminal Q function that approximates cost-to-go over a long horizon, by back-propagating the cost of decisions through the optimization problem \emph{and from the future}.
no code implementations • 29 Sep 2021 • David C Jenkins, René Arendt Sørensen, Vikramank Singh, Philip Kaminsky, Anil Aswani, Ramakrishna Akella
This paper proposes a novel method based on Deep Reinforcement Learning for developing dynamic scheduling policies through interaction with simulated stochastic manufacturing systems.