Clotho

Introduced by Drossos et al. in Clotho: An Audio Captioning Dataset

Clotho is an audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.

Source: https://zenodo.org/record/3490684

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


Modalities


Languages