Search Results for author: Lev McKinney

Found 2 papers, 1 papers with code

Eliciting Latent Predictions from Transformers with the Tuned Lens

2 code implementations • 14 Mar 2023 • Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

Language Modelling

931

Paper
Code

On The Fragility of Learned Reward Functions

no code implementations • 9 Jan 2023 • Lev McKinney, Yawen Duan, David Krueger, Adam Gleave

Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.

Continuous Control

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.