Search Results for author: Tom Bagby

Found 9 papers, 4 papers with code

LAST: Scalable Lattice-Based Speech Modelling in JAX

1 code implementation • 25 Apr 2023 • Ke wu, Ehsan Variani, Tom Bagby, Michael Riley

We introduce LAST, a LAttice-based Speech Transducer library in JAX.

Paper
Code

Learning the joint distribution of two sequences using little or no paired data

no code implementations • 6 Dec 2022 • Soroosh Mariooryad, Matt Shannon, Siyuan Ma, Tom Bagby, David Kao, Daisy Stanton, Eric Battenberg, RJ Skerry-Ryan

We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available.

Variational Inference

Paper
Add Code

Speaker Generation

no code implementations • 7 Nov 2021 • Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao

We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task.

Transfer Learning

Paper
Add Code

Non-saturating GAN training as divergence minimization

no code implementations • 15 Oct 2020 • Matt Shannon, Ben Poole, Soroosh Mariooryad, Tom Bagby, Eric Battenberg, David Kao, Daisy Stanton, RJ Skerry-Ryan

Non-saturating generative adversarial network (GAN) training is widely used and has continued to obtain groundbreaking results.

Generative Adversarial Network

Paper
Add Code

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

3 code implementations • 23 Oct 2019 • Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby

Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text.

Speech Synthesis

29,826

Paper
Code

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

no code implementations • ICLR 2020 • Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

We present a novel generative model that combines state-of-the-art neural text-to-speech (TTS) with semi-supervised probabilistic latent variable models.

Speech Synthesis

Paper
Add Code

Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

1 code implementation • 8 Jun 2019 • Eric Battenberg, Soroosh Mariooryad, Daisy Stanton, RJ Skerry-Ryan, Matt Shannon, David Kao, Tom Bagby

Recent work has explored sequence-to-sequence latent variable models for expressive speech synthesis (supporting control and transfer of prosody and style), but has not presented a coherent framework for understanding the trade-offs between the competing methods.

Expressive Speech Synthesis Style Transfer

Paper
Code

Complex Evolution Recurrent Neural Networks (ceRNNs)

no code implementations • 5 Jun 2019 • Izhak Shafran, Tom Bagby, R. J. Skerry-Ryan

In copy memory task, ceRNNs and uRNNs perform identically, demonstrating that their superior performance over LSTMs is due to complex-valued nature and their linear operators.

speech-recognition Speech Recognition

Paper
Add Code

Streaming End-to-end Speech Recognition For Mobile Devices

2 code implementations • 15 Nov 2018 • Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein

End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition.

speech-recognition Speech Recognition

903

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.