Search Results for author: Shoubin Yu

Found 7 papers, 6 papers with code

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

1 code implementation29 May 2024 Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captions.

Video Understanding Zero-Shot Video Question Answer

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives

1 code implementation28 May 2024 Jaehong Yoon, Shoubin Yu, Mohit Bansal

This paper proposes RACCooN, a versatile and user-friendly video-to-paragraph-to-video generative framework that supports multiple video editing capabilities such as removal, addition, and modification, through a unified pipeline.

Attribute Video Editing

STAR: A Benchmark for Situated Reasoning in Real-World Videos

no code implementations NeurIPS 2021 Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B Tenenbaum, Chuang Gan

This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR Benchmark).

Logical Reasoning Question Answering

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion

1 code implementation8 Feb 2024 Shoubin Yu, Jaehong Yoon, Mohit Bansal

Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities.

Computational Efficiency Optical Flow Estimation +2

A Simple LLM Framework for Long-Range Video Question-Answering

1 code implementation28 Dec 2023 Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Furthermore, we show that a specialized prompt that asks the LLM first to summarize the noisy short-term visual captions and then answer a given input question leads to a significant LVQA performance boost.

Large Language Model Long-range modeling +2

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection

1 code implementation7 Dec 2021 Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, Wei Wu

Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational burden and also avoids the negative impact of background noise.

Anomaly Detection In Surveillance Videos Optical Flow Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.