1 code implementation • 21 Feb 2024 • Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, Yuandong Tian
We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than the $A^*$ implementation that was used for training initially.
2 code implementations • 18 Jul 2022 • Zihan Ding, DiJia Su, Qinghua Liu, Chi Jin
This paper proposes new, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games.
no code implementations • 8 Jun 2022 • DiJia Su, Bertrand Douillard, Rami Al-Rfou, Cheolho Park, Benjamin Sapp
These models are intrinsically invariant to translation and rotation between scene elements, are best-performing on public leaderboards, but scale quadratically with the number of agents and scene elements.
no code implementations • 29 Sep 2021 • DiJia Su, Jason D. Lee, John Mulvey, H. Vincent Poor
In the high support region (low uncertainty), we encourage our policy by taking an aggressive update.
no code implementations • 23 Feb 2021 • DiJia Su, Jason D. Lee, John M. Mulvey, H. Vincent Poor
We consider a setting that lies between pure offline reinforcement learning (RL) and pure online RL called deployment constrained RL in which the number of policy deployments for data sampling is limited.