speech-recognition
999 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in speech-recognition
Libraries
Use these libraries to find speech-recognition models and implementationsMost implemented papers
Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges
Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others.
ISyNet: Convolutional Neural Networks design for AI accelerator
To address this problem we propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method; and ISyNet - a set of architectures designed to be fast on the specialized neural processing unit (NPU) hardware and accurate at the same time.
Robust Speech Recognition via Large-Scale Weak Supervision
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients.
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.
Unsupervised Cross-lingual Representation Learning for Speech Recognition
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
An Overview of Multi-Task Learning in Deep Neural Networks
Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery.
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.