1 code implementation • 6 Apr 2024 • Rustem Yeshpanov, Pavel Efimov, Leonid Boytsov, Ardak Shalkarbayuli, Pavel Braslavski
We introduce KazQAD -- a Kazakh open-domain question answering (ODQA) dataset -- that can be used in both reading comprehension and full ODQA settings, as well as for information retrieval experiments.
1 code implementation • 1 Apr 2024 • Adal Abilbekov, Saida Mussakhojayeva, Rustem Yeshpanov, Huseyin Atakan Varol
This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications.
1 code implementation • 28 Mar 2024 • Rustem Yeshpanov, Alina Polonskaya, Huseyin Atakan Varol
We introduce KazParC, a parallel corpus designed for machine translation across Kazakh, English, Russian, and Turkish.
1 code implementation • 28 Mar 2024 • Rustem Yeshpanov, Huseyin Atakan Varol
This paper presents KazSAnDRA, a dataset developed for Kazakh sentiment analysis that is the first and largest publicly available dataset of its kind.
1 code implementation • 25 May 2023 • Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov
This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
1 code implementation • LREC 2022 • Rustem Yeshpanov, Yerbolat Khassanov, Huseyin Atakan Varol
We present the development of a dataset for Kazakh named entity recognition.