1 code implementation • Interspeech 2023 • Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi
The ability of countermeasure models to generalize from seen speech synthesis methods to unseen ones has been investigated in the ASVspoof challenge.
no code implementations • 23 Mar 2023 • Haoyu Tang, Zhaoyi Liu, Chang Zeng, Xinfeng Li
To overcome the drawback of universal Transformer models for the application of ASR on edge devices, we propose a solution that can reuse the block in Transformer models for the occasion of the small footprint ASR system, which meets the objective of accommodating resource limitations without compromising recognition accuracy.
Ranked #11 on Speech Recognition on AISHELL-1
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 22 Feb 2023 • Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang
Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.
1 code implementation • Interspeech 2023 • Chunhui Wang, Chang Zeng, Xing He
XiaoiceSing is a singing voice synthesis (SVS) system that aims at generating 48kHz singing voices.
1 code implementation • 23 Oct 2022 • Chunhui Wang, Chang Zeng, Jun Chen, Xing He
Entertainment-oriented singing voice synthesis (SVS) requires a vocoder to generate high-fidelity (e. g. 48kHz) audio.
no code implementations • 11 Oct 2022 • Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang
The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech.
no code implementations • 1 Sep 2022 • Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi
Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring.
no code implementations • 1 Sep 2022 • Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi
Current state-of-the-art automatic speaker verification (ASV) systems are vulnerable to presentation attacks, and several countermeasures (CMs), which distinguish bona fide trials from spoofing ones, have been explored to protect ASV.
1 code implementation • 17 Apr 2021 • Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang
Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.
1 code implementation • 4 Apr 2021 • Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities.
Ranked #1 on Speaker Verification on CN-CELEB