Search Results for author: Hammad Ayyubi

Found 3 papers, 1 papers with code

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

no code implementations • 18 May 2024 • Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context.

Visual Question Answering (VQA)

Paper
Add Code

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

no code implementations • 27 Mar 2024 • Ali Zare, Yulei Niu, Hammad Ayyubi, Shih-Fu Chang

(3) Annotation cost: Annotating instructional videos with step-level labels (i. e., timestamp) or sequence-level labels (i. e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets. In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined.

Relation Retrieval +1

Paper
Add Code

Weakly-Supervised Temporal Article Grounding

1 code implementation • 22 Oct 2022 • Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang

Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales.

Natural Language Queries Sentence +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.