no code implementations • 30 Jul 2023 • Eric Sun, Jinyu Li, Jian Xue, Yifan Gong
When mixing 20, 000 hours augmented speech data generated by our method with 12, 500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8. 7% relative word error rate reduction.
no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.
no code implementations • 5 Nov 2022 • Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li
In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 4 Nov 2022 • Jian Xue, Peidong Wang, Jinyu Li, Eric Sun
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language.
no code implementations • 10 Dec 2021 • Kenichi Kumatani, Robert Gmyr, Felipe Cruz Salinas, Linquan Liu, Wei Zuo, Devang Patel, Eric Sun, Yu Shi
The sparsely-gated Mixture of Experts (MoE) can magnify a network capacity with a little computational complexity.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Oct 2021 • Rimita Lahiri, Kenichi Kumatani, Eric Sun, Yao Qian
Multilingual end-to-end(E2E) models have shown a great potential in the expansion of the language coverage in the realm of automatic speech recognition(ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 13 Jul 2021 • Long Zhou, Jinyu Li, Eric Sun, Shujie Liu
Particularly, a single CMM can be deployed to any user scenario where the users can pre-select any combination of languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong
In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.
no code implementations • 2 Feb 2021 • Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong
The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Mar 2020 • Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong
While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 Sep 2019 • Liang Lu, Eric Sun, Yifan Gong
Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network.