no code implementations • 29 Apr 2024 • Scott Viteri, Max Lamparth, Peter Chatain, Clark Barrett
We derive a "Markovian training" procedure by applying our definition of informativeness to a Markovian LM and optimizing via policy gradient and Proximal Policy Optimization (PPO).
1 code implementation • 25 Oct 2023 • Gabriel Mukobi, Peter Chatain, Su Fong, Robert Windesheim, Gitta Kutyniok, Kush Bhatia, Silas Alberti
Here, we focus on two prevalent methods used to align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).
no code implementations • 14 Feb 2023 • Michael Sun, Peter Chatain
In recent years, neural networks (NNs) have made giant leaps in a wide variety of domains.