2 code implementations • 14 Mar 2023 • Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt
We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.
no code implementations • 9 Jan 2023 • Lev McKinney, Yawen Duan, David Krueger, Adam Gleave
Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.