1 code implementation • 22 Apr 2024 • Jan-Philipp Fränken, Eric Zelikman, Rafael Rafailov, Kanishk Gandhi, Tobias Gerstenberg, Noah D. Goodman
On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%.
1 code implementation • 14 Mar 2024 • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman
Crucially, these improvements require no fine-tuning on these tasks.
no code implementations • 10 Oct 2023 • Eric Zelikman, Wanjing Anya Ma, Jasmine E. Tran, Diyi Yang, Jason D. Yeatman, Nick Haber
Developing an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses.
1 code implementation • 3 Oct 2023 • Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai
In this work, we use a language-model-infused scaffolding program to improve itself.
1 code implementation • 21 Sep 2023 • Elisa Kreiss, Eric Zelikman, Christopher Potts, Nick Haber
None of the methods is successful with ContextRef, but we show that careful fine-tuning yields substantial improvements.
no code implementations • 11 Sep 2023 • Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman
Because of the prohibitive cost of generation with state-of-the-art LLMs, we consider a middle step to filter the set of hypotheses that will be implemented into programs: we either ask the LLM to summarize into a smaller set of hypotheses, or ask human annotators to select a subset of the hypotheses.
1 code implementation • 20 Jun 2023 • Yuhao Nie, Eric Zelikman, Andea Scott, Quentin Paletta, Adam Brandt
Furthermore, we feed the generated future sky images from the video prediction models for 15-minute-ahead probabilistic solar forecasting for a 30-kW roof-top PV system, and compare it with an end-to-end deep learning baseline model SUNSET and a smart persistence model.
1 code implementation • 16 Jun 2023 • Eric Zelikman, Qian Huang, Percy Liang, Nick Haber, Noah D. Goodman
Language model training in distributed settings is limited by the communication cost of gradient exchanges.
no code implementations • 6 Jun 2023 • Gabriel Poesia, Kanishk Gandhi, Eric Zelikman, Noah D. Goodman
In experiments on PrOntoQA, ProofWriter and Syllogism Validity datasets, \textsc{LogicGuide} significantly improves the performance of GPT-3, GPT-3. 5 Turbo and LLaMA (accuracy gains up to 35\%), while drastically reducing \emph{content effects} -- the interference between unwanted prior assumptions and reasoning, which humans and language models suffer from.
1 code implementation • 20 Dec 2022 • Eric Zelikman, Qian Huang, Gabriel Poesia, Noah D. Goodman, Nick Haber
Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs.
Ranked #8 on Code Generation on HumanEval
1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
1 code implementation • 21 May 2022 • Elisa Kreiss, Cynthia Bennett, Shayan Hooshmand, Eric Zelikman, Meredith Ringel Morris, Christopher Potts
Few images on the Web receive alt-text descriptions that would make them accessible to blind and low vision (BLV) users.
1 code implementation • 28 Mar 2022 • Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman
We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA.
Ranked #17 on Common Sense Reasoning on CommonsenseQA
no code implementations • 9 Oct 2020 • Eric Zelikman, Sharon Zhou, Jeremy Irvin, Cooper Raterink, Hao Sheng, Anand Avati, Jack Kelly, Ram Rajagopal, Andrew Y. Ng, David Gagne
Advancing probabilistic solar forecasting methods is essential to supporting the integration of solar energy into the electricity grid.
1 code implementation • ICLR 2021 • Sharon Zhou, Eric Zelikman, Fred Lu, Andrew Y. Ng, Gunnar Carlsson, Stefano Ermon
Learning disentangled representations is regarded as a fundamental task for improving the generalization, robustness, and interpretability of generative models.
no code implementations • 26 May 2020 • Eric Zelikman, Christopher Healy, Sharon Zhou, Anand Avati
Calibrated uncertainty estimates in machine learning are crucial to many fields such as autonomous vehicles, medicine, and weather and climate forecasting.
no code implementations • 20 Apr 2020 • Eric Zelikman, William Yin, Kenneth Wang
A significant challenge in developing AI that can generalize well is designing agents that learn about their world without being told what to learn, and apply that learning to challenges with sparse rewards.
1 code implementation • 22 Mar 2018 • Eric Zelikman, Richard Socher
We introduce contextual salience (CoSal), a measure of word importance that uses the distribution of context vectors to normalize distances and weights.