Search Results for author: Julian Chan

Found 9 papers, 1 papers with code

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

no code implementations • 9 Jul 2021 • Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer

Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

no code implementations • 5 Apr 2021 • Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer

How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area.

Language Modelling speech-recognition +1

Paper
Add Code

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

no code implementations • 5 Apr 2021 • Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

speech-recognition Speech Recognition

Paper
Add Code

Deep Shallow Fusion for RNN-T Personalization

no code implementations • 16 Nov 2020 • Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations • 9 Nov 2020 • Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

no code implementations • 27 Oct 2020 • Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao

We compare the transformer based acoustic models with their LSTM counterparts on industrial scale tasks.

speech-recognition Speech Recognition +1

Paper
Add Code

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition

1 code implementation • 21 Oct 2020 • Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Mike Seltzer

For a low latency scenario with an average latency of 80 ms, Emformer achieves WER $3. 01\%$ on test-clean and $7. 09\%$ on test-other.

speech-recognition Speech Recognition

Paper
Code

Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding

no code implementations • 1 Nov 2017 • Anjishnu Kumar, Arpit Gupta, Julian Chan, Sam Tucker, Bjorn Hoffmeister, Markus Dreyer, Stanislav Peshterliev, Ankur Gandhe, Denis Filiminov, Ariya Rastrow, Christian Monson, Agnika Kumar

This paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon's virtual assistant, Alexa.

Spoken Language Understanding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.