Search Results for author: Akifumi Wachi

Found 15 papers, 4 papers with code

Paper
Add Code

Stepwise Alignment for Constrained Language Model Policy Optimization

no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q Tran, Rei Sato, Takumi Tanabe, Yohei Akimoto

This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO).

Computational Efficiency Language Modelling

Paper
Add Code

A Survey of Constraint Formulations in Safe Reinforcement Learning

no code implementations • 3 Feb 2024 • Akifumi Wachi, Xun Shen, Yanan Sui

Safety is critical when applying reinforcement learning (RL) to real-world problems.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Long-term Safe Reinforcement Learning with Binary Feedback

no code implementations • 8 Jan 2024 • Akifumi Wachi, Wataru Hashimoto, Kazumune Hashimoto

Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Verbosity Bias in Preference Labeling by Large Language Models

no code implementations • 16 Oct 2023 • Keita Saito, Akifumi Wachi, Koki Wataoka, Youhei Akimoto

In recent years, Large Language Models (LLMs) have witnessed a remarkable surge in prevalence, altering the landscape of natural language processing and machine learning.

reinforcement-learning

Paper
Add Code

Bayesian Meta-Learning on Control Barrier Functions with Data from On-Board Sensors

no code implementations • 10 Aug 2023 • Wataru Hashimoto, Kazumune Hashimoto, Akifumi Wachi, Xun Shen, Masako Kishida, Shigemasa Takai

The proposed scheme realizes efficient online synthesis of the controller as shown in the simulation study and provides probabilistic safety guarantees on the resulting controller.

Meta-Learning Navigate

Paper
Add Code

Safe Policy Optimization with Local Generalized Linear Function Approximations

1 code implementation • NeurIPS 2021 • Akifumi Wachi, Yunyue Wei, Yanan Sui

Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems.

Reinforcement Learning (RL) Safe Exploration

Paper
Code

LOA: Logical Optimal Actions for Text-based Interaction Games

1 code implementation • ACL 2021 • Daiki Kimura, Subhajit Chaudhury, Masaki Ono, Michiaki Tatsubori, Don Joven Agravante, Asim Munawar, Akifumi Wachi, Ryosuke Kohita, Alexander Gray

We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Neuro-Symbolic Reinforcement Learning with First-Order Logic

no code implementations • EMNLP 2021 • Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michiaki Tatsubori, Asim Munawar, Alexander Gray

Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Reinforcement Learning with External Knowledge by using Logical Neural Networks

no code implementations • 3 Mar 2021 • Daiki Kimura, Subhajit Chaudhury, Akifumi Wachi, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori, Alexander Gray

Specifically, we propose an integrated method that enables model-free reinforcement learning from external knowledge sources in an LNNs-based logical constrained framework such as action shielding and guide.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Polar Embedding

no code implementations • CoNLL (EMNLP) 2021 • Ran Iwamoto, Ryosuke Kohita, Akifumi Wachi

Particularly, the latest approaches such as hyperbolic embeddings showed significant performance by representing essential meanings in a hierarchy (generality and similarity of objects) with spatial properties (distance from the origin and difference of angles).

Link Prediction

Paper
Add Code

Q-learning with Language Model for Edit-based Unsupervised Summarization

1 code implementation • EMNLP 2020 • Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana

Q-learning is leveraged to train the agent to produce proper edit actions.

Abstractive Text Summarization Decoder +2

Paper
Code

Safe Reinforcement Learning in Constrained Markov Decision Processes

1 code implementation • ICML 2020 • Akifumi Wachi, Yanan Sui

Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Failure-Scenario Maker for Rule-Based Agent using Multi-agent Adversarial Reinforcement Learning and its Application to Autonomous Driving

no code implementations • 26 Mar 2019 • Akifumi Wachi

We propose a method for efficiently finding failure scenarios; this method trains the adversarial agents using multi-agent reinforcement learning such that the tested rule-based agent fails.

Autonomous Driving Multi-agent Reinforcement Learning +2

Paper
Add Code

Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process

no code implementations • 12 Sep 2018 • Akifumi Wachi, Hiroshi Kajino, Asim Munawar

This paper presents a learning algorithm called ST-SafeMDP for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant.

Robot Navigation Safe Exploration

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.