Search Results for author: Govind Thattai

Found 25 papers, 7 papers with code

Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI

no code implementations • 9 Aug 2023 • Hangjie Shi, Leslie Ball, Govind Thattai, Desheng Zhang, Lucy Hu, Qiaozi Gao, Suhaila Shakiah, Xiaofeng Gao, Aishwarya Padmakumar, Bofei Yang, Cadence Chung, Dinakar Guthy, Gaurav Sukhatme, Karthika Arumugam, Matthew Wen, Osman Ipek, Patrick Lange, Rohan Khanna, Shreyas Pansare, Vasu Sharma, Chao Zhang, Cris Flagg, Daniel Pressel, Lavina Vaz, Luke Dai, Prasoon Goyal, Sattvik Sahai, Shaohua Liu, Yao Lu, Anna Gottardi, Shui Hu, Yang Liu, Dilek Hakkani-Tur, Kate Bland, Heather Rocker, James Jeun, Yadunandana Rao, Michael Johnston, Akshaya Iyengar, Arindam Mandal, Prem Natarajan, Reza Ghanadan

The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge.

Paper
Add Code

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

no code implementations • 7 Aug 2023 • Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions.

Offline RL reinforcement-learning +1

Paper
Add Code

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

no code implementations • 2 Aug 2023 • Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme

We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting.

LEMMA Robot Manipulation

Paper
Add Code

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

no code implementations • 26 May 2023 • Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver Steeg

Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model parameters to the pre-trained network.

Neural Architecture Search

Paper
Add Code

Alexa Arena: A User-Centric Interactive Platform for Embodied AI

1 code implementation • NeurIPS 2023 • Qiaozi Gao, Govind Thattai, Suhaila Shakiah, Xiaofeng Gao, Shreyas Pansare, Vasu Sharma, Gaurav Sukhatme, Hangjie Shi, Bofei Yang, Desheng Zheng, Lucy Hu, Karthika Arumugam, Shui Hu, Matthew Wen, Dinakar Guthy, Cadence Chung, Rohan Khanna, Osman Ipek, Leslie Ball, Kate Bland, Heather Rocker, Yadunandana Rao, Michael Johnston, Reza Ghanadan, Arindam Mandal, Dilek Hakkani Tur, Prem Natarajan

We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research.

Instruction Following

Paper
Code

Language-Informed Transfer Learning for Embodied Household Activities

no code implementations • 12 Jan 2023 • Yuqian Jiang, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme

For service robots to become general-purpose in everyday household environments, they need not only a large library of primitive skills, but also the ability to quickly learn novel tasks specified by users.

Paper
Add Code

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

no code implementations • CVPR 2023 • Da Yin, Feng Gao, Govind Thattai, Michael Johnston, Kai-Wei Chang

A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical region.

Paper
Add Code

OpenD: A Benchmark for Language-Driven Door and Drawer Opening

no code implementations • 10 Dec 2022 • Yizhou Zhao, Qiaozi Gao, Liang Qiu, Govind Thattai, Gaurav S. Sukhatme

We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction.

Paper
Add Code

TPA-Net: Generate A Dataset for Text to Physics-based Animation

no code implementations • 25 Nov 2022 • Yuxing Qiu, Feng Gao, Minchen Li, Govind Thattai, Yin Yang, Chenfanfu Jiang

Recent breakthroughs in Vision-Language (V&L) joint research have achieved remarkable results in various text-driven tasks.

Physical Simulations

Paper
Add Code

Towards Reasoning-Aware Explainable VQA

no code implementations • 9 Nov 2022 • Rakesh Vaideeswaran, Feng Gao, Abhinav Mathur, Govind Thattai

Our method generates human-readable textual explanations while maintaining SOTA VQA accuracy on the GQA-REX (77. 49%) and VQA-E (71. 48%) datasets.

Decoder Explanation Generation +3

Paper
Add Code

CH-MARL: A Multimodal Benchmark for Cooperative, Heterogeneous Multi-Agent Reinforcement Learning

no code implementations • 26 Aug 2022 • Vasu Sharma, Prasoon Goyal, Kaixiang Lin, Govind Thattai, Qiaozi Gao, Gaurav S. Sukhatme

We propose a multimodal (vision-and-language) benchmark for cooperative and heterogeneous multi-agent learning.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

A Multi-level Alignment Training Scheme for Video-and-Language Grounding

no code implementations • 22 Apr 2022 • Yubo Zhang, Feiyang Niu, Qing Ping, Govind Thattai

To solve video-and-language grounding tasks, the key is for the network to understand the connection between the two modalities.

Retrieval Semantic Similarity +1

Paper
Add Code

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

2 code implementations • 27 Feb 2022 • Xiaofeng Gao, Qiaozi Gao, Ran Gong, Kaixiang Lin, Govind Thattai, Gaurav S. Sukhatme

Language-guided Embodied AI benchmarks requiring an agent to navigate an environment and manipulate objects typically allow one-way communication: the human user gives a natural language command to the agent, and the agent can only follow the command passively.

Instruction Following Navigate

Paper
Code

Privacy Preserving Visual Question Answering

no code implementations • 15 Feb 2022 • Cristian-Paul Bara, Qing Ping, Abhinav Mathur, Govind Thattai, Rohith MV, Gaurav S. Sukhatme

We introduce a novel privacy-preserving methodology for performing Visual Question Answering on the edge.

Privacy Preserving Question Answering +1

Paper
Add Code

Learning to Act with Affordance-Aware Multimodal Neural SLAM

1 code implementation • 24 Jan 2022 • Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme

With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23. 48% on the test unseen scenes.

Efficient Exploration Test unseen

Paper
Code

Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

no code implementations • 21 Jan 2022 • Tongzhou Mu, Kaixiang Lin, Feiyang Niu, Govind Thattai

We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input.

Decision Making reinforcement-learning +3

Paper
Add Code

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering

no code implementations • 14 Jan 2022 • Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, Prem Natarajan

Outside-knowledge visual question answering (OK-VQA) requires the agent to comprehend the image, make use of relevant knowledge from the entire web, and digest all the information to answer the question.

Generative Question Answering Passage Retrieval +2

Paper
Add Code

Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering

no code implementations • CVPR 2022 • Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, Prem Natarajan

Most previous works address the problem by first fusing the image and question in the multi-modal space, which is inflexible for further fusion with a vast amount of external knowledge.

Ranked #19 on Visual Question Answering (VQA) on OK-VQA

Generative Question Answering Passage Retrieval +2

Paper
Add Code

Best of Both Worlds: A Hybrid Approach for Multi-Hop Explanation with Declarative Facts

no code implementations • AAAI Workshop CLeaR 2022 • Shane Storks, Qiaozi Gao, Aishwarya Reganti, Govind Thattai

Language-enabled AI systems can answer complex, multi-hop questions to high accuracy, but supporting answers with evidence is a more challenging task which is important for the transparency and trustworthiness to users.

Explanation Generation Retrieval

Paper
Add Code

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

1 code implementation • 10 Nov 2021 • Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme

However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.

Indoor Scene Synthesis Scene Generation

Paper
Code

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

1 code implementation • 10 Aug 2021 • Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav Sukhatme

Language-guided robots performing home and office tasks must navigate in and interact with the world.

Navigate Object

Paper
Code

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation

1 code implementation • CVPR 2021 • Tao Tu, Qing Ping, Govind Thattai, Gokhan Tur, Prem Natarajan

Most existing work for Guesser encode the dialog history as a whole and train the Guesser models from scratch on the GuessWhat?!

Referring Expression Referring Expression Comprehension +2

Paper
Code

Are We There Yet? Learning to Localize in Embodied Instruction Following

no code implementations • 9 Jan 2021 • Shane Storks, Qiaozi Gao, Govind Thattai, Gokhan Tur

Embodied instruction following is a challenging problem requiring an agent to infer a sequence of primitive actions to achieve a goal environment state from complex language and visual inputs.

Instruction Following object-detection +1

Paper
Add Code

Interactive Teaching for Conversational AI

no code implementations • 2 Dec 2020 • Qing Ping, Feiyang Niu, Govind Thattai, Joel Chengottusseriyil, Qiaozi Gao, Aishwarya Reganti, Prashanth Rajagopal, Gokhan Tur, Dilek Hakkani-Tur, Prem Nataraja

Current conversational AI systems aim to understand a set of pre-designed requests and execute related actions, which limits them to evolve naturally and adapt based on human interactions.

Paper
Add Code

LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering

3 code implementations • 21 Nov 2020 • Weixin Liang, Feiyang Niu, Aishwarya Reganti, Govind Thattai, Gokhan Tur

We show that LRTA makes a step towards truly understanding the question while the state-of-the-art model tends to learn superficial correlations from the training data.

Answer Generation Question Answering +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.