1 code implementation • 28 Feb 2024 • Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya
This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one.
1 code implementation • 31 Jan 2024 • Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo
We study a primal-dual reinforcement learning (RL) algorithm for the online constrained Markov decision processes (CMDP) problem, wherein the agent explores an optimal policy that maximizes return while satisfying constraints.
no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko
We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.
no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko
Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.
1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.
1 code implementation • 18 Apr 2023 • Kazumi Kasaura, Shuwa Miura, Tadashi Kozuno, Ryo Yonetani, Kenta Hoshino, Yohei Hosoe
This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms.
no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang
Moreover, we prove the alternative form still plays a similar role as the original form.
1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko
Imperfect information games (IIG) are games in which each player only partially observes the current game state.
no code implementations • 27 Oct 2022 • Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári
Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.
no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.
no code implementations • 18 May 2022 • Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White
The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters.
no code implementations • NeurIPS 2021 • Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko
We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.
no code implementations • ICLR 2022 • Dongqi Han, Tadashi Kozuno, Xufang Luo, Zhao-Yun Chen, Kenji Doya, Yuqing Yang, Dongsheng Li
How to make intelligent decisions is a central problem in machine learning and cognitive science.
no code implementations • 17 Jul 2021 • Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White
Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification.
1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko
Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.
no code implementations • 11 Jun 2021 • Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko
We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.
1 code implementation • NeurIPS 2021 • Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu
These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC.
1 code implementation • 23 Mar 2021 • Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu
Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments.
no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel
These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.
no code implementations • NeurIPS 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Remi Munos, Matthieu Geist
Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.
no code implementations • 31 Mar 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist
Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.
no code implementations • 18 Jun 2019 • Tadashi Kozuno, Dongqi Han, Kenji Doya
We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning.
no code implementations • 30 Oct 2017 • Tadashi Kozuno, Eiji Uchibe, Kenji Doya
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks.