Search Results for author: Thien Q Tran

Found 1 papers, 0 papers with code

Stepwise Alignment for Constrained Language Model Policy Optimization

no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q Tran, Rei Sato, Takumi Tanabe, Yohei Akimoto

This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO).

Computational Efficiency Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.