no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q. Tran, Rei Sato, Takumi Tanabe, Youhei Akimoto
This paper formulates human value alignment as an optimization problem of the language model policy to maximize reward under a safety constraint, and then proposes an algorithm, Stepwise Alignment for Constrained Policy Optimization (SACPO).
no code implementations • 9 Sep 2021 • Thien Q. Tran, Kazuto Fukuchi, Youhei Akimoto, Jun Sakuma
The challenge is that we have to discover in an unsupervised manner a set of concepts, i. e., A, B and C, that is useful for the explaining the classifier.
no code implementations • 22 Aug 2020 • Thien Q. Tran, Jun Sakuma
We also carefully design a feature selection method to select proper search terms to predict each component.