no code implementations • 28 May 2024 • Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng
There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation.
no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng
In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation.
no code implementations • 16 Nov 2021 • Midia Yousefi, John H. L. Hansen
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity.
no code implementations • 30 Oct 2021 • Midia Yousefi, John H. L. Hansen
Most current speech technology systems are designed to operate well even in the presence of multiple active speakers.
no code implementations • 30 Oct 2021 • Midia Yousefi, John H. L. Hanse
The speaker conditioning process allows the acoustic model to perform computation in the context of target-speaker auxiliary information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 4 Aug 2019 • Midia Yousefi, Soheil Khorram, John H. L. Hansen
Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error.