Search Results for author: Jan Leike

Found 36 papers, 14 papers with code

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

no code implementations • 14 Dec 2023 • Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs.

Paper
Add Code

Let's Verify Step by Step

3 code implementations • Preprint 2023 • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data)

Active Learning Math +2

1,289

Paper
Code

GPT-4 Technical Report

9 code implementations • Preprint 2023 • OpenAI, :, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko, Madelaine Boyd, Anna-Luisa Brakman, Greg Brockman, Tim Brooks, Miles Brundage, Kevin Button, Trevor Cai, Rosie Campbell, Andrew Cann, Brittany Carey, Chelsea Carlson, Rory Carmichael, Brooke Chan, Che Chang, Fotis Chantzis, Derek Chen, Sully Chen, Ruby Chen, Jason Chen, Mark Chen, Ben Chess, Chester Cho, Casey Chu, Hyung Won Chung, Dave Cummings, Jeremiah Currier, Yunxing Dai, Cory Decareaux, Thomas Degry, Noah Deutsch, Damien Deville, Arka Dhar, David Dohan, Steve Dowling, Sheila Dunning, Adrien Ecoffet, Atty Eleti, Tyna Eloundou, David Farhi, Liam Fedus, Niko Felix, Simón Posada Fishman, Juston Forte, Isabella Fulford, Leo Gao, Elie Georges, Christian Gibson, Vik Goel, Tarun Gogineni, Gabriel Goh, Rapha Gontijo-Lopes, Jonathan Gordon, Morgan Grafstein, Scott Gray, Ryan Greene, Joshua Gross, Shixiang Shane Gu, Yufei Guo, Chris Hallacy, Jesse Han, Jeff Harris, Yuchen He, Mike Heaton, Johannes Heidecke, Chris Hesse, Alan Hickey, Wade Hickey, Peter Hoeschele, Brandon Houghton, Kenny Hsu, Shengli Hu, Xin Hu, Joost Huizinga, Shantanu Jain, Shawn Jain, Joanne Jang, Angela Jiang, Roger Jiang, Haozhun Jin, Denny Jin, Shino Jomoto, Billie Jonn, Heewoo Jun, Tomer Kaftan, Łukasz Kaiser, Ali Kamali, Ingmar Kanitscheider, Nitish Shirish Keskar, Tabarak Khan, Logan Kilpatrick, Jong Wook Kim, Christina Kim, Yongjik Kim, Jan Hendrik Kirchner, Jamie Kiros, Matt Knight, Daniel Kokotajlo, Łukasz Kondraciuk, Andrew Kondrich, Aris Konstantinidis, Kyle Kosic, Gretchen Krueger, Vishal Kuo, Michael Lampe, Ikai Lan, Teddy Lee, Jan Leike, Jade Leung, Daniel Levy, Chak Ming Li, Rachel Lim, Molly Lin, Stephanie Lin, Mateusz Litwin, Theresa Lopez, Ryan Lowe, Patricia Lue, Anna Makanju, Kim Malfacini, Sam Manning, Todor Markov, Yaniv Markovski, Bianca Martin, Katie Mayer, Andrew Mayne, Bob McGrew, Scott Mayer McKinney, Christine McLeavey, Paul McMillan, Jake McNeil, David Medina, Aalok Mehta, Jacob Menick, Luke Metz, Andrey Mishchenko, Pamela Mishkin, Vinnie Monaco, Evan Morikawa, Daniel Mossing, Tong Mu, Mira Murati, Oleg Murk, David Mély, Ashvin Nair, Reiichiro Nakano, Rajeev Nayak, Arvind Neelakantan, Richard Ngo, Hyeonwoo Noh, Long Ouyang, Cullen O'Keefe, Jakub Pachocki, Alex Paino, Joe Palermo, Ashley Pantuliano, Giambattista Parascandolo, Joel Parish, Emy Parparita, Alex Passos, Mikhail Pavlov, Andrew Peng, Adam Perelman, Filipe de Avila Belbute Peres, Michael Petrov, Henrique Ponde de Oliveira Pinto, Michael, Pokorny, Michelle Pokrass, Vitchyr H. Pong, Tolly Powell, Alethea Power, Boris Power, Elizabeth Proehl, Raul Puri, Alec Radford, Jack Rae, Aditya Ramesh, Cameron Raymond, Francis Real, Kendra Rimbach, Carl Ross, Bob Rotsted, Henri Roussez, Nick Ryder, Mario Saltarelli, Ted Sanders, Shibani Santurkar, Girish Sastry, Heather Schmidt, David Schnurr, John Schulman, Daniel Selsam, Kyla Sheppard, Toki Sherbakov, Jessica Shieh, Sarah Shoker, Pranav Shyam, Szymon Sidor, Eric Sigler, Maddie Simens, Jordan Sitkin, Katarina Slama, Ian Sohl, Benjamin Sokolowsky, Yang song, Natalie Staudacher, Felipe Petroski Such, Natalie Summers, Ilya Sutskever, Jie Tang, Nikolas Tezak, Madeleine B. Thompson, Phil Tillet, Amin Tootoonchian, Elizabeth Tseng, Preston Tuggle, Nick Turley, Jerry Tworek, Juan Felipe Cerón Uribe, Andrea Vallone, Arun Vijayvergiya, Chelsea Voss, Carroll Wainwright, Justin Jay Wang, Alvin Wang, Ben Wang, Jonathan Ward, Jason Wei, CJ Weinmann, Akila Welihinda, Peter Welinder, Jiayi Weng, Lilian Weng, Matt Wiethoff, Dave Willner, Clemens Winter, Samuel Wolrich, Hannah Wong, Lauren Workman, Sherwin Wu, Jeff Wu, Michael Wu, Kai Xiao, Tao Xu, Sarah Yoo, Kevin Yu, Qiming Yuan, Wojciech Zaremba, Rowan Zellers, Chong Zhang, Marvin Zhang, Shengjia Zhao, Tianhao Zheng, Juntang Zhuang, William Zhuk, Barret Zoph

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

Ranked #1 on Multi-task Language Understanding on MMLU (using extra training data)

Arithmetic Reasoning Bug fixing +10

13,970

Paper
Code

Self-critiquing models for assisting human evaluators

1 code implementation • 12 Jun 2022 • William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang, Jonathan Ward, Jan Leike

On a topic-based summarization task, critiques written by our models help humans find flaws in summaries that they would have otherwise missed.

Paper
Code

Training language models to follow instructions with human feedback

8 code implementations • 4 Mar 2022 • Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.

57,703

Paper
Code

Safe Deep RL in 3D Environments using Human Feedback

no code implementations • 20 Jan 2022 • Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike

In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors.

Paper
Add Code

Revealing the Incentive to Cause Distributional Shift

no code implementations • 29 Sep 2021 • David Krueger, Tegan Maharaj, Jan Leike

We use these unit tests to demonstrate that changes to the learning algorithm (e. g. introducing meta-learning) can cause previously hidden incentives to be revealed, resulting in qualitatively different behaviour despite no change in performance metric.

Meta-Learning

Paper
Add Code

Recursively Summarizing Books with Human Feedback

no code implementations • 22 Sep 2021 • Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano

Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves.

Abstractive Text Summarization Question Answering

Paper
Add Code

Evaluating Large Language Models Trained on Code

13 code implementations • 7 Jul 2021 • Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.

Ranked #1 on Multi-task Language Understanding on BBH-alg

Code Generation Language Modelling +1

7,795

Paper
Code

Institutionalising Ethics in AI through Broader Impact Requirements

no code implementations • 30 May 2021 • Carina Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, Allan Dafoe

In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research.

Ethics

Paper
Add Code

Active Reinforcement Learning: Observing Rewards at a Cost

no code implementations • 13 Nov 2020 • David Krueger, Jan Leike, Owain Evans, John Salvatier

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Hidden Incentives for Auto-Induced Distributional Shift

no code implementations • 19 Sep 2020 • David Krueger, Tegan Maharaj, Jan Leike

We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs.

BIG-bench Machine Learning Meta-Learning +1

Paper
Add Code

Quantifying Differences in Reward Functions

1 code implementation • ICLR 2021 • Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

However, this method cannot distinguish between the learned reward function failing to reflect user preferences and the policy optimization process failing to optimize the learned reward.

Paper
Code

Pitfalls of learning a reward function online

no code implementations • 28 Apr 2020 • Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

Paper
Add Code

Learning Human Objectives by Evaluating Hypothetical Behavior

1 code implementation • ICML 2020 • Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike

To address this challenge, we propose an algorithm that safely and interactively learns a model of the user's reward function.

Car Racing

Paper
Code

Hidden incentives for self-induced distributional shift

no code implementations • 25 Sep 2019 • David Scott Krueger, Tegan Maharaj, Shane Legg, Jan Leike

Decisions made by machine learning systems have increasing influence on the world.

BIG-bench Machine Learning Meta-Learning

Paper
Add Code

Scaling shared model governance via model splitting

no code implementations • ICLR 2019 • Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli

Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Scalable agent alignment via reward modeling: a research direction

3 code implementations • 19 Nov 2018 • Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning +1

Paper
Code

Reward learning from human preferences and demonstrations in Atari

2 code implementations • NeurIPS 2018 • Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions.

Atari Games Imitation Learning +2

Paper
Code

Learning to Understand Goal Specifications by Modelling Reward

1 code implementation • ICLR 2019 • Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

Paper
Code

AI Safety Gridworlds

2 code implementations • 27 Nov 2017 • Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.

reinforcement-learning Reinforcement Learning (RL) +1

595

Paper
Code

Deep reinforcement learning from human preferences

5 code implementations • NeurIPS 2017 • Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems.

Atari Games reinforcement-learning +1

296

Paper
Code

Universal Reinforcement Learning Algorithms: Survey and Experiments

1 code implementation • 30 May 2017 • John Aslanides, Jan Leike, Marcus Hutter

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP).

reinforcement-learning Reinforcement Learning (RL)

148

Paper
Code

Generalised Discount Functions applied to a Monte-Carlo AImu Implementation

1 code implementation • 3 Mar 2017 • Sean Lamont, John Aslanides, Jan Leike, Marcus Hutter

We have added to the GRL simulation platform AIXIjs the functionality to assign an agent arbitrary discount functions, and an environment which can be used to determine the effect of discounting on an agent's policy.

General Reinforcement Learning reinforcement-learning +1

148

Paper
Code

Nonparametric General Reinforcement Learning

no code implementations • 28 Nov 2016 • Jan Leike

However, there are Bayesian approaches to general RL that satisfy objective optimality guarantees: We prove that Thompson sampling is asymptotically optimal in stochastic environments in the sense that its value converges to the value of the optimal policy.

General Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Exploration Potential

no code implementations • 16 Sep 2016 • Jan Leike

We introduce exploration potential, a quantity that measures how much a reinforcement learning agent has explored its environment class.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

A Formal Solution to the Grain of Truth Problem

no code implementations • 16 Sep 2016 • Jan Leike, Jessica Taylor, Benya Fallenstein

In this paper we present a formal and general solution to the full grain of truth problem: we construct a class of policies that contains all computable policies as well as Bayes-optimal policies for every lower semicomputable prior over the class.

Thompson Sampling

Paper
Add Code

Loss Bounds and Time Complexity for Speed Priors

no code implementations • 12 Apr 2016 • Daniel Filan, Marcus Hutter, Jan Leike

On a polynomial time computable sequence our speed prior is computable in exponential time.

Paper
Add Code

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

On the Computability of AIXI

no code implementations • 19 Oct 2015 • Jan Leike, Marcus Hutter

Solomonoff induction and the reinforcement learning agent AIXI are proposed answers to this question.

BIG-bench Machine Learning reinforcement-learning +1

Paper
Add Code

Bad Universal Priors and Notions of Optimality

no code implementations • 16 Oct 2015 • Jan Leike, Marcus Hutter

A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM).

Open-Ended Question Answering

Paper
Add Code

On the Computability of Solomonoff Induction and Knowledge-Seeking

no code implementations • 15 Jul 2015 • Jan Leike, Marcus Hutter

Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Solomonoff Induction Violates Nicod's Criterion

no code implementations • 15 Jul 2015 • Jan Leike, Marcus Hutter

Nicod's criterion states that observing a black raven is evidence for the hypothesis H that all ravens are black.

Paper
Add Code

Sequential Extensions of Causal and Evidential Decision Theory

no code implementations • 24 Jun 2015 • Tom Everitt, Jan Leike, Marcus Hutter

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward.

Decision Making

Paper
Add Code

A Definition of Happiness for Reinforcement Learning Agents

no code implementations • 18 May 2015 • Mayank Daswani, Jan Leike

What is happiness for reinforcement learning agents?

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Indefinitely Oscillating Martingales

no code implementations • 14 Aug 2014 • Jan Leike, Marcus Hutter

We construct a class of nonnegative martingale processes that oscillate indefinitely with high probability.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.