M2KR (Multi-task Multi-modal Knowledge Retrieval)

Introduced by Lin et al. in PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

The M2KR is a collection of datasets designed for training and evaluating general-purpose vision-language retrievers. These datasets are released in Huggingface Dataset format and cover various retrieval tasks. Let's delve into the details:

  1. Image to Text (I2T) retrieval: This task involves retrieving relevant textual descriptions given an input image.
  2. Question to Text (Q2T) retrieval: Here, the goal is to retrieve relevant text passages based on a given question.
  3. Image & Question to Text (IQ2T) retrieval: This task combines both image and question inputs to retrieve relevant textual information.

The M2KR benchmark comprises nine datasets, each tailored for specific tasks. Some of these datasets include:

  • WIT (Web Image Text): A dataset for I2T retrieval.
  • IGLUE (Image-Grounded Language Understanding Evaluation): Used for Q2T retrieval.
  • KVQA (Knowledge Visual Question Answering): Relevant for IQ2T retrieval.
  • CC3M (Common Crawl 3 Million): Another dataset for IQ2T retrieval.
  • OVEN (Open Vision and Language Evaluation): Used in IQ2T retrieval.
  • LLaVA (Large-scale Language-Visual Association): Relevant for I2T retrieval.
  • OKVQA (Open Knowledge Visual Question Answering): Used in IQ2T retrieval.
  • Infoseek: A dataset for I2T retrieval.
  • E-VQA (English Visual Question Answering): Relevant for IQ2T retrieval.

These datasets enable researchers to develop and evaluate vision-language models, and they play a crucial role in advancing the field of multimodal understanding and retrieval¹².

(1) M2KR Benchmark Datasets - GitHub. https://github.com/LinWeizheDragon/FLMR/blob/main/docs/Datasets.md. (2) arXiv:2402.08327v1 [cs.CL] 13 Feb 2024. https://arxiv.org/pdf/2402.08327.pdf. (3) Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers. https://preflmr.github.io/. (4) undefined. https://avatars.githubusercontent.com/u/33350454?v=4. (5) undefined. https://github.com/LinWeizheDragon/FLMR/blob/main/docs/Datasets.md?raw=true. (6) undefined. https://desktop.github.com. (7) undefined. https://github.com/LinWeizheDragon/FLMR/raw/main/docs/Datasets.md.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages