Search Results for author: Collin Burns

Found 12 papers, 8 papers with code

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

no code implementations • 14 Dec 2023 • Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs.

Paper
Add Code

Discovering Latent Knowledge in Language Models Without Supervision

1 code implementation • 7 Dec 2022 • Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect.

Imitation Learning Language Modelling +2

235

Paper
Code

Measuring Coding Challenge Competence With APPS

3 code implementations • 20 May 2021 • Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt

Recent models such as GPT-Neo can pass approximately 20% of the test cases of introductory problems, so we find that machine learning models are now beginning to learn how to code.

Ranked #8 on Code Generation on APPS

BIG-bench Machine Learning Code Generation

3,290

Paper
Code

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

2 code implementations • 10 Mar 2021 • Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball

We address this bottleneck within the legal domain by introducing the Contract Understanding Atticus Dataset (CUAD), a new dataset for legal contract review.

360

Paper
Code

Limitations of Post-Hoc Feature Alignment for Robustness

1 code implementation • CVPR 2021 • Collin Burns, Jacob Steinhardt

Feature alignment is an approach to improving robustness to distribution shift that matches the distribution of feature activations between the training distribution and test distribution.

Unsupervised Domain Adaptation

Paper
Code

Measuring Mathematical Problem Solving With the MATH Dataset

4 code implementations • 5 Mar 2021 • Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.

Ranked #96 on Math Word Problem Solving on MATH

Math Math Word Problem Solving +1

724

Paper
Code

How Multipurpose Are Language Models?

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

Elementary Mathematics World Knowledge

Paper
Add Code

Towards Machine Ethics with Language Models

no code implementations • ICLR 2021 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model’s knowledge of basic concepts of morality.

Ethics World Knowledge

Paper
Add Code

Measuring Massive Multitask Language Understanding

12 code implementations • 7 Sep 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt

Ranked #60 on Multi-task Language Understanding on MMLU

Elementary Mathematics Multi-task Language Understanding +1

5,642

Paper
Code

Aligning AI With Shared Human Values

2 code implementations • 5 Aug 2020 • Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

We show how to assess a language model's knowledge of basic concepts of morality.

Ranked #1 on Average on hendrycks2020ethics

Ethics reinforcement-learning +2

959

Paper
Code

Streaming Complexity of SVMs

no code implementations • 7 Jul 2020 • Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff

We show that, for both problems, for dimensions $d=1, 2$, one can obtain streaming algorithms with space polynomially smaller than $\frac{1}{\lambda\epsilon}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$.

Paper
Add Code

Interpreting Black Box Models via Hypothesis Testing

1 code implementation • 29 Mar 2019 • Collin Burns, Jesse Thomason, Wesley Tansey

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments.

Two-sample testing

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.