1 code implementation • 25 Apr 2023 • Ke wu, Ehsan Variani, Tom Bagby, Michael Riley
We introduce LAST, a LAttice-based Speech Transducer library in JAX.
no code implementations • 6 Dec 2022 • Soroosh Mariooryad, Matt Shannon, Siyuan Ma, Tom Bagby, David Kao, Daisy Stanton, Eric Battenberg, RJ Skerry-Ryan
We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available.
no code implementations • 7 Nov 2021 • Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao
We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task.
no code implementations • 15 Oct 2020 • Matt Shannon, Ben Poole, Soroosh Mariooryad, Tom Bagby, Eric Battenberg, David Kao, Daisy Stanton, RJ Skerry-Ryan
Non-saturating generative adversarial network (GAN) training is widely used and has continued to obtain groundbreaking results.
3 code implementations • 23 Oct 2019 • Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby
Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text.
no code implementations • ICLR 2020 • Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby
We present a novel generative model that combines state-of-the-art neural text-to-speech (TTS) with semi-supervised probabilistic latent variable models.
1 code implementation • 8 Jun 2019 • Eric Battenberg, Soroosh Mariooryad, Daisy Stanton, RJ Skerry-Ryan, Matt Shannon, David Kao, Tom Bagby
Recent work has explored sequence-to-sequence latent variable models for expressive speech synthesis (supporting control and transfer of prosody and style), but has not presented a coherent framework for understanding the trade-offs between the competing methods.
no code implementations • 5 Jun 2019 • Izhak Shafran, Tom Bagby, R. J. Skerry-Ryan
In copy memory task, ceRNNs and uRNNs perform identically, demonstrating that their superior performance over LSTMs is due to complex-valued nature and their linear operators.
2 code implementations • 15 Nov 2018 • Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition.