Search Results for author: Thien Q Tran

Found 1 papers, 0 papers with code

Stepwise Alignment for Constrained Language Model Policy Optimization

no code implementations17 Apr 2024 Akifumi Wachi, Thien Q Tran, Rei Sato, Takumi Tanabe, Yohei Akimoto

This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO).

Computational Efficiency Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.