1 code implementation • 25 Apr 2023 • Ke wu, Ehsan Variani, Tom Bagby, Michael Riley
We introduce LAST, a LAttice-based Speech Transducer library in JAX.
no code implementations • 16 Feb 2023 • Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran
We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.
no code implementations • 22 Dec 2022 • Ehsan Variani, Ke wu, David Rybach, Cyril Allauzen, Michael Riley
Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 31 Oct 2022 • Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno
In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.
no code implementations • 2 Jul 2022 • Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey
We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg.
1 code implementation • 26 May 2022 • Ehsan Variani, Ke wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen
We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition.
no code implementations • 15 Apr 2022 • Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach
Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups.
no code implementations • 27 Oct 2020 • Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman
The proposed model consists of streaming and non-streaming encoders.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Mar 2020 • Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 26 Feb 2020 • Erik McDermott, Hasim Sak, Ehsan Variani
The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 20 Nov 2018 • Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub
Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Apr 2015 • Ehsan Variani, Kamel Lahouel, Avner Bar-Hen, Bruno Jedynak
The problem of target localization with noise is addressed.