GEM (A General Evaluation Benchmark on Multi-modal Tasks)

Introduced by Su et al. in GEM: A General Evaluation Benchmark for Multimodal Tasks

GEM (A General Evaluation Benchmark on Multi-modal Tasks) is a significant benchmark dataset designed to evaluate the performance of cross-modal pre-trained models, including both understanding and generation tasks. Unlike existing datasets such as GLUE, SuperGLUE, XGLUE, and XTREME, which primarily focus on natural language tasks, GEM stands out as a large-scale vision-language benchmark.

Here are the key features of GEM:

Multimodal Focus: GEM covers both vision and language domains. It consists of two main components:
GEM-I: This part focuses on image-language tasks.
GEM-V: This part focuses on video-language tasks.
Large-Scale Dataset: GEM is one of the largest vision-language datasets available. It encompasses both image-language and video-language tasks simultaneously.
Multilingual Labeling: The dataset is labeled in multiple languages, making it versatile for multilingual multimodal research.
Baseline Models: The creators of GEM provide two baseline models to facilitate research and development in this area.

The goal of GEM is to advance the field of multimodal research by providing a comprehensive evaluation benchmark that spans vision and language modalities. Researchers can use this dataset to assess the capabilities of their models across different tasks and languages¹².

(1) GEM: A General Evaluation Benchmark for Multimodal Tasks. https://arxiv.org/abs/2106.09889. (2) GEM Submission Instructions - GitHub Pages. https://microsoft.github.io/GEM/. (3) GEM: A General Evaluation Benchmark for Multimodal Tasks. https://www.microsoft.com/en-us/research/publication/gem-a-general-evaluation-benchmark-for-multimodal-tasks/. (4) undefined. https://doi.org/10.48550/arXiv.2106.09889.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

GEM (A General Evaluation Benchmark on Multi-modal Tasks)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

GEM (A General Evaluation Benchmark on Multi-modal Tasks)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages