OCR-VQA Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

The **OCR-VQA dataset** is a valuable resource for research in the field of **Visual Question Answering (VQA)**. Let me provide you with some details about it:

1. **Dataset Overview**:
    - The **OCR-VQA dataset** contains a total of **207,572 images** along with their associated **question-answer pairs**.
    - These images are related to **document content** and are accompanied by their corresponding **OCR transcriptions**¹².

2. **Purpose and Significance**:
    - **Visual Question Answering (VQA)** tasks require models to reason jointly over visual information (such as images) and natural language inputs (such as questions).
    - By using this dataset, researchers can develop and evaluate AI models that can effectively understand and answer questions based on visual content and textual context.

3. **Other Related VQA Datasets**:
    - Apart from OCR-VQA, there are other VQA datasets available for research and benchmarking:
        - **ScreenQA**: Focused on questions related to screen content.
        - **MP-DocVQA**: A dataset for document-based VQA.
        - **ChartQA**: Specifically designed for answering questions about charts.
        - **InfographicVQA**: For handling questions related to infographics.

Source: Conversation with Bing, 3/15/2024
(1) OCR-VQA Dataset | Papers With Code. https://paperswithcode.com/dataset/ocr-vqa.
(2) GitHub - anisha2102/docvqa: Document Visual Question Answering. https://github.com/anisha2102/docvqa.
(3) VQA: Visual Question Answering. https://visualqa.org/.
(4) allenai/aokvqa: Official repository for the A-OKVQA dataset - GitHub. https://github.com/allenai/aokvqa.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

OCR-VQA

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

ScreenQA Short

Screen2Words

MP-DocVQA

InfographicVQA

Usage

License

Modalities

Languages

OCR-VQA

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit