Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models

Accents are crucial in human communication as they help us understand others and allow us to communicate intelligibly in a way others understand us. While there has been significant progress in ASR, African-accented ASR has been understudied due to a lack of training datasets which are often expensive to create and demand colossal human labor. Our study aims to address this problem by automating the annotation process and reducing annotation-related expenses through informative uncertainty-based data selection. We propose a new multi-rounds adaptation process that uses epistemic uncertainty and evaluate it across several domains, datasets, and high-performing ASR models. Our results show that our approach leads to a 69.44\% WER improvement while requiring on average 45\% less data than established baselines. Our approach also improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African ASR. Moreover, the results of our active learning experiments, simulating real-world settings, where there are no \textit{gold} transcriptions available, also demonstrate the ability of our approach to favor good quality real-life transcriptions. This indicates that our proposed approach addresses the immediate issue of African-accented ASR and has broader implications for improving ASR systems for other underrepresented and low-resource languages and accents. We open-source the code https://github.com/bonaventuredossou/active_learning_african_asr

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here