Video to Text Retrieval

8 papers with code • 2 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data

antoine77340/Mixture-of-Embedding-Experts 7 Apr 2018

We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.

Bridging Video-text Retrieval with Multiple Choice Questions

tencentarc/mcq CVPR 2022

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

CryhanFang/CLIP2Video 21 Jun 2021

We present CLIP2Video network to transfer the image-language pre-training model to video-text retrieval in an end-to-end manner.

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

google-research/google-research 1 Apr 2022

Large pretrained (e. g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on.

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

tencentarc/mcq 26 Apr 2022

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

willyfh/msvd-indonesian 20 Jun 2023

Since the availability of the pretraining resources with Indonesian sentences is relatively limited, the applicability of those approaches to our dataset is still questionable.

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

leolee99/pau NeurIPS 2023

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Sakuga-42M Dataset: Scaling Up Cartoon Research

zhenglinpan/SakugaDataset 13 May 2024

Can we harness the success of the scaling paradigm to benefit cartoon research?