Search Results for author: Gunnar A. Sigurdsson

Found 13 papers, 6 papers with code

Characterizing Video Question Answering with Sparsified Inputs

no code implementations • 27 Nov 2023 • Shiyuan Huang, Robinson Piramuthu, Vicente Ordonez, Shih-Fu Chang, Gunnar A. Sigurdsson

From our experiments, we have observed only 5. 2%-5. 8% loss of performance with only 10% of video lengths, which corresponds to 2-4 frames selected from each video.

Question Answering Video Question Answering

Paper
Add Code

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

no code implementations • 12 Mar 2023 • Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie

In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed.

Decision Making

Paper
Add Code

RREx-BoT: Remote Referring Expressions with a Bag of Tricks

no code implementations • 30 Jan 2023 • Gunnar A. Sigurdsson, Jesse Thomason, Gaurav S. Sukhatme, Robinson Piramuthu

Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9. 84% on remote object grounding above state of the art models for REVERIE and of 5. 04% on FAO.

Object Object Localization

Paper
Add Code

Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy

1 code implementation • 15 Oct 2022 • Shiyuan Huang, Robinson Piramuthu, Shih-Fu Chang, Gunnar A. Sigurdsson

Specifically, we insert a lightweight Feature Compression Module (FeatComp) into a VideoQA model which learns to extract task-specific tiny features as little as 10 bits, which are optimal for answering certain types of questions.

Feature Compression Question Answering +1

336

Paper
Code

Beyond the Camera: Neural Networks in World Coordinates

no code implementations • 12 Mar 2020 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Karteek Alahari

Eye movement and strategic placement of the visual field onto the retina, gives animals increased resolution of the scene and suppresses distracting information.

Action Recognition Video Stabilization +1

Paper
Add Code

Visual Grounding in Video for Unsupervised Word Translation

1 code implementation • CVPR 2020 • Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

Given this shared embedding we demonstrate that (i) we can map words between the languages, particularly the 'visual' words; (ii) that the shared embedding provides a good initialization for existing unsupervised text-based word translation techniques, forming the basis for our proposed hybrid visual-text mapping algorithm, MUVE; and (iii) our approach achieves superior performance by addressing the shortcomings of text-based methods -- it is more robust, handles datasets with less commonality, and is applicable to low-resource languages.

Translation Visual Grounding +1

Paper
Code

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

no code implementations • 25 Apr 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.

General Classification Video Classification +1

Paper
Add Code

Actor and Observer: Joint Modeling of First and Third-Person Videos

1 code implementation • CVPR 2018 • Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).

Action Recognition Temporal Action Localization

Paper
Code

What Actions are Needed for Understanding Human Actions in Videos?

1 code implementation • ICCV 2017 • Gunnar A. Sigurdsson, Olga Russakovsky, Abhinav Gupta

We present the many kinds of information that will be needed to achieve substantial gains in activity understanding: objects, verbs, intent, and sequential reasoning.

Benchmarking

Paper
Code

Asynchronous Temporal Fields for Action Recognition

2 code implementations • CVPR 2017 • Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta

Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it.

Ranked #16 on Action Detection on Charades

Action Classification Action Recognition +3

201

Paper
Code

Much Ado About Time: Exhaustive Annotation of Temporal Data

no code implementations • 25 Jul 2016 • Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments).

Paper
Add Code

Learning Visual Storylines with Skipping Recurrent Neural Networks

1 code implementation • 14 Apr 2016 • Gunnar A. Sigurdsson, Xinlei Chen, Abhinav Gupta

What does a typical visit to Paris look like?

Paper
Code

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations • 6 Apr 2016 • Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition Temporal Action Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.