Search Results for author: Takumi Tanabe

Found 3 papers, 2 papers with code

Stepwise Alignment for Constrained Language Model Policy Optimization

no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q. Tran, Rei Sato, Takumi Tanabe, Youhei Akimoto

This paper formulates human value alignment as an optimization problem of the language model policy to maximize reward under a safety constraint, and then proposes an algorithm, Stepwise Alignment for Constrained Policy Optimization (SACPO).

Computational Efficiency Language Modelling

Paper
Add Code

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

1 code implementation • 7 Nov 2022 • Takumi Tanabe, Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set.

Paper
Code

Level Generation for Angry Birds with Sequential VAE and Latent Variable Evolution

1 code implementation • 13 Apr 2021 • Takumi Tanabe, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto

When ML techniques are applied to game domains with non-tile-based level representation, such as Angry Birds, where objects in a level are specified by real-valued parameters, ML often fails to generate playable levels.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.