Search Results for author: Shoubin Yu

Found 7 papers, 6 papers with code

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

1 code implementation • 29 May 2024 • Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captions.

Ranked #1 on Zero-Shot Video Question Answer on IntentQA

Video Understanding Zero-Shot Video Question Answer

Paper
Code

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives

1 code implementation • 28 May 2024 • Jaehong Yoon, Shoubin Yu, Mohit Bansal

This paper proposes RACCooN, a versatile and user-friendly video-to-paragraph-to-video generative framework that supports multiple video editing capabilities such as removal, addition, and modification, through a unified pipeline.

Attribute Video Editing

Paper
Code

STAR: A Benchmark for Situated Reasoning in Real-World Videos

no code implementations • NeurIPS 2021 • Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B Tenenbaum, Chuang Gan

This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR Benchmark).

Logical Reasoning Question Answering

Paper
Add Code

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

1 code implementation • 8 Feb 2024 • Shoubin Yu, Jaehong Yoon, Mohit Bansal

Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities.

Ranked #1 on Question Answering on SQA3D

Computational Efficiency Optical Flow Estimation +2

Paper
Code

A Simple LLM Framework for Long-Range Video Question-Answering

1 code implementation • 28 Dec 2023 • Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost.

Ranked #1 on Zero-Shot Video Question Answer on NExT-GQA

Large Language Model Long-range modeling +2

Paper
Code

Self-Chained Image-Language Model for Video Localization and Question Answering

1 code implementation • NeurIPS 2023 • Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

Ranked #3 on Video Question Answering on STAR Benchmark

Language Modelling Representation Learning +2

166

Paper
Code

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection

1 code implementation • 7 Dec 2021 • Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu

Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational burden and also avoids the negative impact of background noise.

Anomaly Detection In Surveillance Videos Optical Flow Estimation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.