The M2KR is a collection of datasets designed for training and evaluating general-purpose vision-language retrievers. These datasets are released in Huggingface Dataset format and cover various retrieval tasks. Let's delve into the details:
The M2KR benchmark comprises nine datasets, each tailored for specific tasks. Some of these datasets include:
These datasets enable researchers to develop and evaluate vision-language models, and they play a crucial role in advancing the field of multimodal understanding and retrieval¹².
(1) M2KR Benchmark Datasets - GitHub. https://github.com/LinWeizheDragon/FLMR/blob/main/docs/Datasets.md. (2) arXiv:2402.08327v1 [cs.CL] 13 Feb 2024. https://arxiv.org/pdf/2402.08327.pdf. (3) Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers. https://preflmr.github.io/. (4) undefined. https://avatars.githubusercontent.com/u/33350454?v=4. (5) undefined. https://github.com/LinWeizheDragon/FLMR/blob/main/docs/Datasets.md?raw=true. (6) undefined. https://desktop.github.com. (7) undefined. https://github.com/LinWeizheDragon/FLMR/raw/main/docs/Datasets.md.
Paper | Code | Results | Date | Stars |
---|