no code implementations • 8 May 2024 • Canaan Breiss, Alexis Ross, Amani Maina-Kilaas, Roger Levy, Jacob Andreas
We propose an interactive approach to language learning that utilizes linguistic acceptability judgments from an informant (a competent language user) to learn a grammar.
no code implementations • 7 May 2024 • Alexis Ross, Jacob Andreas
AdapT has two components: (1) a collection of simulated Bayesian student models that can be used for evaluation of automated teaching methods; (2) a platform for evaluation with human students, to characterize the real-world effectiveness of these methods.
1 code implementation • 5 Jul 2023 • Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, Yoon Kim
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills.
1 code implementation • 21 Jun 2023 • Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey
Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response.
no code implementations • 15 Jun 2023 • Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, Ethan Perez
Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e. g., due to flaws in the training objective and data.
1 code implementation • 26 May 2023 • Marcos Treviso, Alexis Ross, Nuno M. Guerreiro, André F. T. Martins
Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models.
no code implementations • 24 Oct 2022 • Alexis Ross, Matthew E. Peters, Ana Marasović
Specifically, we evaluate how training self-rationalization models with free-text rationales affects robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes.
1 code implementation • ACL 2022 • Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner
We craft a set of operations to modify the control codes, which in turn steer generation towards targeted attributes.
no code implementations • EMNLP 2021 • Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith
In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.
1 code implementation • Findings (ACL) 2021 • Alexis Ross, Ana Marasović, Matthew E. Peters
Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case).
1 code implementation • NeurIPS 2021 • Alexis Ross, Himabindu Lakkaraju, Osbert Bastani
As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations.
no code implementations • IJCNLP 2019 • Alexis Ross, Ellie Pavlick
In natural language inference (NLI), contexts are considered veridical if they allow us to infer that their underlying propositions make true claims about the real world.
no code implementations • SEMEVAL 2019 • Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick
Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably.