Search Results for author: Ehsan Variani

Found 12 papers, 2 papers with code

LAST: Scalable Lattice-Based Speech Modelling in JAX

1 code implementation • 25 Apr 2023 • Ke wu, Ehsan Variani, Tom Bagby, Michael Riley

We introduce LAST, a LAttice-based Speech Transducer library in JAX.

Paper
Code

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

no code implementations • 16 Feb 2023 • Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran

We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.

Language Modelling speech-recognition +1

Paper
Add Code

Alignment Entropy Regularization

no code implementations • 22 Dec 2022 • Ehsan Variani, Ke wu, David Rybach, Cyril Allauzen, Michael Riley

Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Modular Hybrid Autoregressive Transducer

no code implementations • 31 Oct 2022 • Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.

Decoder Language Modelling +2

Paper
Add Code

UserLibri: A Dataset for ASR Personalization Using Only Text

no code implementations • 2 Jul 2022 • Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg.

Language Modelling speech-recognition +1

Paper
Add Code

Global Normalization for Streaming Speech Recognition in a Modular Framework

1 code implementation • 26 May 2022 • Ehsan Variani, Ke wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition.

speech-recognition Speech Recognition

Paper
Code

Improving Rare Word Recognition with LM-aware MWER Training

no code implementations • 15 Apr 2022 • Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach

Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups.

Paper
Add Code

Cascaded encoders for unifying streaming and non-streaming ASR

no code implementations • 27 Oct 2020 • Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman

The proposed model consists of streaming and non-streaming encoders.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Hybrid Autoregressive Transducer (hat)

no code implementations • 12 Mar 2020 • Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

no code implementations • 26 Feb 2020 • Erik McDermott, Hasim Sak, Ehsan Variani

The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

WEST: Word Encoded Sequence Transducers

no code implementations • 20 Nov 2018 • Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Non-Adaptive Policies for 20 Questions Target Localization

no code implementations • 22 Apr 2015 • Ehsan Variani, Kamel Lahouel, Avner Bar-Hen, Bruno Jedynak

The problem of target localization with noise is addressed.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.