MathVerse

Introduced by Zhang et al. in MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

MathVerse is an innovative benchmark specifically designed to rigorously evaluate the capabilities of Multi-modal Large Language Models (MLLMs) in interpreting and reasoning with visual information in mathematical problems. Developed by a research team from CUHK MMLab and Shanghai Artificial Intelligence Laboratory, MathVerse offers an equitable and comprehensive assessment of MLLMs' ability to understand and process visual diagrams for mathematical reasoning.

The benchmark consists of 2,612 high-quality, multi-subject math problems with diagrams, meticulously collected from publicly available sources. Each problem is then transformed by human annotators into six distinct versions, each offering varying degrees of information content in multi-modality. This approach results in a total of 15K test samples, allowing MathVerse to thoroughly examine whether and how much MLLMs can genuinely comprehend visual diagrams for solving mathematical problems.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

MathVerse

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Q-Bench

Geometry3K

GeoQA

Usage

License

Modalities

Languages

MathVerse

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Q-Bench

Geometry3K

GeoQA

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages