BiMed1.3M

Introduced by Pieri et al. in BiMediX: Bilingual Medical Mixture of Experts LLM

The dataset covers three types of medical interactions in both English and Arabic:
- Multiple-choice question answering (MCQA), focusing on specialized medical knowledge.
- Open question answering (QA), including real-world consumer questions.
- MCQA-Grounded multi-turn chat conversations for dynamic exchanges.

A semi-automated translation pipeline with human alignment was used to create high-quality Arabic versions. The BiMed1.3M dataset results from translating 444,995 English samples into Arabic and mixing Arabic and English in a 1:2 ratio.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages