no code implementations • 7 Oct 2023 • Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception.
no code implementations • 5 Jun 2023 • Li Fu, Siqi Li, Qingtao Li, Fangzhu Li, Liping Deng, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He
Self-Supervised Learning (SSL) Automatic Speech Recognition (ASR) models have shown great promise over Supervised Learning (SL) ones in low-resource settings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 26 Oct 2022 • Li Fu, Siqi Li, Qingtao Li, Liping Deng, Fangzhu Li, Lu Fan, Meng Chen, Xiaodong He
In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Oct 2021 • Li Fu, Xiaoxiao Li, Runyu Wang, Lu Fan, Zhengchen Zhang, Meng Chen, Youzheng Wu, Xiaodong He
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 11 May 2020 • Li Fu, Xiaoxiao Li, Libo Zi, Zhengchen Zhang, Youzheng Wu, Xiaodong He, BoWen Zhou
In this paper, we propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR) which enables an ASR system to perform well on new tasks while maintaining the performance on its originally learned ones.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Apr 2020 • Li Fu, Xiaoxiao Li, Libo Zi
To improve the performance of RNN-T for Mandarin speech recognition task, a novel transformer transducer with the combination architecture of self-attention transformer and RNN is proposed.
no code implementations • 11 Sep 2019 • Kun Zhang, Peng He, Ping Yao, Ge Chen, Rui Wu, Min Du, Huimin Li, Li Fu, Tianyao Zheng
Specifically, RAM learns a group of weights to represent the different importance of feature maps across resolutions, and the GPR gradually merges every two feature maps from low to high resolutions to regress final human keypoint heatmaps.