no code implementations • 24 Apr 2023 • Mohan Li, Rama Doddipatla, Catalin Zorila
In previous works, latency was optimised by truncating the online attention weights based on the hard alignments obtained from conventional ASR models, without taking into account the potential loss of ASR accuracy.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 May 2022 • Catalin Zorila, Rama Doddipatla
Improving the accuracy of single-channel automatic speech recognition (ASR) in noisy conditions is challenging.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 3 May 2022 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition.
no code implementations • 11 Mar 2022 • Mohan Li, Shucong Zhang, Catalin Zorila, Rama Doddipatla
In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Nov 2021 • Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach
Impressive progress in neural network-based single-channel speech source separation has been made in recent years.
no code implementations • 15 Jun 2021 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
The proposed method first uses mixtures of unseparated sources and the mixture invariant training (MixIT) criterion to train a teacher model.
no code implementations • 26 Apr 2021 • Mohan Li, Catalin Zorila, Rama Doddipatla
Online Transformer-based automatic speech recognition (ASR) systems have been extensively studied due to the increasing demand for streaming applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Feb 2021 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments.
no code implementations • 11 Nov 2020 • Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
To reduce the influence of reverberation on spatial feature extraction, a dereverberation pre-processing method has been applied to further improve the separation performance.
1 code implementation • 26 Sep 2019 • Catalin Zorila, Christoph Boeddeker, Rama Doddipatla, Reinhold Haeb-Umbach
Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available.