no code implementations • EMNLP (ACL) 2021 • Alane Suhr, Clara Vania, Nikita Nangia, Maarten Sap, Mark Yatskar, Samuel R. Bowman, Yoav Artzi
Even though it is such a fundamental tool in NLP, crowdsourcing use is largely guided by common practices and the personal experience of researchers.
1 code implementation • 9 Apr 2024 • Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control.
no code implementations • 14 Nov 2023 • Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr
To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning.
1 code implementation • 31 Oct 2023 • Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge
We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them.
1 code implementation • 17 Oct 2023 • Melanie Sclar, Yejin Choi, Yulia Tsvetkov, Alane Suhr
In this work, we focus on LLM sensitivity to a quintessential class of meaning-preserving design choices: prompt formatting.
no code implementations • NeurIPS 2023 • Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).
no code implementations • 1 Jun 2023 • Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov
We present SymbolicToM, a plug-and-play approach to reason about the belief states of multiple characters in reading comprehension tasks via explicit symbolic representation.
1 code implementation • 27 Apr 2023 • Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi
We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.
no code implementations • 28 Jan 2023 • Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, Roy Fox
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
1 code implementation • NeurIPS 2023 • Alane Suhr, Yoav Artzi
We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions.
no code implementations • 29 Nov 2022 • Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi
We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines.
1 code implementation • Findings (EMNLP) 2021 • Anna Effenberger, Eva Yan, Rhia Singh, Alane Suhr, Yoav Artzi
We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise.
no code implementations • 10 Aug 2021 • Noriyuki Kojima, Alane Suhr, Yoav Artzi
We study continual learning for natural language instruction generation, by observing human users' instruction execution.
no code implementations • ACL 2020 • Alane Suhr, Ming-Wei Chang, Peter Shaw, Kenton Lee
We study the task of cross-database semantic parsing (XSP), where a system that maps natural language utterances to executable SQL queries is evaluated on databases unseen during training.
no code implementations • IJCNLP 2019 • Alane Suhr, Claudia Yan, Charlotte Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi
We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it.
1 code implementation • 23 Sep 2019 • Alane Suhr, Yoav Artzi
We show that the performance of existing models (Li et al., 2019; Tan and Bansal 2019) is relatively robust to this potential bias.
4 code implementations • CVPR 2019 • Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi
We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task.
Ranked #10 on Vision and Language Navigation on Touchdown Dataset
1 code implementation • ACL 2019 • Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi
We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language.
no code implementations • ACL 2018 • Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Suhr, Luke Zettlemoyer
Semantic parsing, the study of translating natural language utterances into machine-executable programs, is a well-established research area and has applications in question answering, instruction following, voice assistants, and code generation.
1 code implementation • ACL 2018 • Alane Suhr, Yoav Artzi
We propose a learning approach for mapping context-dependent sequential instructions to actions.
1 code implementation • NAACL 2018 • Alane Suhr, Srinivasan Iyer, Yoav Artzi
We propose a context-dependent model to map utterances within an interaction to executable formal queries.
no code implementations • 2 Oct 2017 • Stephanie Zhou, Alane Suhr, Yoav Artzi
To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world.
no code implementations • ACL 2017 • Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi
We present a new visual reasoning language dataset, containing 92, 244 pairs of examples of natural statements grounded in synthetic images with 3, 962 unique sentences.