no code implementations • 9 Mar 2024 • Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe
Performance evaluation using large language models reveals the advantage of the linguistic hint by achieving 14. 1% and 5. 5% relative improvement on test sets of the ASRU and SEAME datasets, respectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 29 Sep 2023 • Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur
Languages usually switch within a multilingual speech signal, especially in a bilingual society.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 31 May 2023 • Leibny Paola Garcia Perera, Y. H. Victoria Chua, Hexin Liu, Fei Ting Woon, Andy W. H. Khong, Justin Dauwels, Sanjeev Khudanpur, Suzy J. Styles
This paper introduces the inaugural Multilingual Everyday Recordings- Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge, focused on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom.
1 code implementation • 30 May 2023 • Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels
These overview metrics do not provide information about model performance at the level of individual speakers, recordings, or units of speech with different linguistic characteristics.
1 code implementation • 30 May 2023 • Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles
To enhance the reliability and robustness of language identification (LID) and language diarization (LD) systems for heterogeneous populations and scenarios, there is a need for speech processing models to be trained on datasets that feature diverse language registers and speech patterns.
1 code implementation • 26 Oct 2022 • Hexin Liu, HaiHua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur
The comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 23 Mar 2022 • Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur
We propose a novel model to hierarchically incorporate phoneme and phonotactic information for language identification (LID) without requiring phoneme annotations for training.
1 code implementation • 7 Mar 2022 • Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, Sanjeev Khudanpur
In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances.
1 code implementation • 26 Oct 2021 • Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong
We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension.
Ranked #3 on Audio Super-Resolution on VCTK Multi-Speaker
2 code implementations • 30 Jan 2021 • Karn Watcharasupat, Anh H. T. Nguyen, Ching-Hui Ooi, Andy W. H. Khong
In blind source separation of speech signals, the inherent imbalance in the source spectrum poses a challenge for methods that rely on single-source dominance for the estimation of the mixing matrix.
2 code implementations • 19 Jun 2017 • Anh H. T. Nguyen, V.G. Reju, Andy W. H. Khong, Ing Yann Soon
Our approach is a generalization of sparse filtering and K-hyperlines clustering.