MSVD-QA

The MSVD-QA dataset is a Video Question Answering (VideoQA) dataset. It is based on the existing Microsoft Research Video Description (MSVD) dataset, which consists of about 120K sentences describing more than 2,000 video snippets. In the MSVD-QA dataset, Question-Answer (QA) pairs are generated from these descriptions. The dataset is mainly used in video captioning experiments but due to its large data size, it is also used for VideoQA. It contains 1970 video clips and approximately 50.5K QA pairs.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Visual Question Answering (VQA)	MSVD-QA	VLAB
Zero-Shot Video Question Answer	MSVD-QA	PLLaVA
Visual Question Answering	MSVD-QA	FrozenBiLM
Zero-Shot Learning	MSVD-QA	HiTeA
Zeroshot Video Question Answer	MSVD-QA	FrozeBiLM