no code implementations • 13 Jul 2018 • Nikhil Balaji, Stefan Kiefer, Petr Novotný, Guillermo A. Pérez, Mahsa Shirmohammadi
We show that, given a horizon $n$ in binary and an MDP, computing an optimal policy is EXP-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis.