no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q Tran, Rei Sato, Takumi Tanabe, Yohei Akimoto
This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO).
no code implementations • 3 Feb 2024 • Akifumi Wachi, Xun Shen, Yanan Sui
Safety is critical when applying reinforcement learning (RL) to real-world problems.
no code implementations • 8 Jan 2024 • Akifumi Wachi, Wataru Hashimoto, Kazumune Hashimoto
Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability.
no code implementations • 16 Oct 2023 • Keita Saito, Akifumi Wachi, Koki Wataoka, Youhei Akimoto
In recent years, Large Language Models (LLMs) have witnessed a remarkable surge in prevalence, altering the landscape of natural language processing and machine learning.
no code implementations • 10 Aug 2023 • Wataru Hashimoto, Kazumune Hashimoto, Akifumi Wachi, Xun Shen, Masako Kishida, Shigemasa Takai
The proposed scheme realizes efficient online synthesis of the controller as shown in the simulation study and provides probabilistic safety guarantees on the resulting controller.
1 code implementation • NeurIPS 2021 • Akifumi Wachi, Yunyue Wei, Yanan Sui
Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems.
1 code implementation • ACL 2021 • Daiki Kimura, Subhajit Chaudhury, Masaki Ono, Michiaki Tatsubori, Don Joven Agravante, Asim Munawar, Akifumi Wachi, Ryosuke Kohita, Alexander Gray
We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games.
no code implementations • EMNLP 2021 • Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michiaki Tatsubori, Asim Munawar, Alexander Gray
Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided.
no code implementations • 3 Mar 2021 • Daiki Kimura, Subhajit Chaudhury, Akifumi Wachi, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori, Alexander Gray
Specifically, we propose an integrated method that enables model-free reinforcement learning from external knowledge sources in an LNNs-based logical constrained framework such as action shielding and guide.
no code implementations • CoNLL (EMNLP) 2021 • Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi
Particularly, the latest approaches such as hyperbolic embeddings showed significant performance by representing essential meanings in a hierarchy (generality and similarity of objects) with spatial properties (distance from the origin and difference of angles).
1 code implementation • EMNLP 2020 • Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana
Q-learning is leveraged to train the agent to produce proper edit actions.
1 code implementation • ICML 2020 • Akifumi Wachi, Yanan Sui
Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications.
no code implementations • 26 Mar 2019 • Akifumi Wachi
We propose a method for efficiently finding failure scenarios; this method trains the adversarial agents using multi-agent reinforcement learning such that the tested rule-based agent fails.
no code implementations • 12 Sep 2018 • Akifumi Wachi, Hiroshi Kajino, Asim Munawar
This paper presents a learning algorithm called ST-SafeMDP for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant.