11 code implementations • ICLR 2021 • Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.
Ranked #2 on Speech Synthesis on LJSpeech
no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
4 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.
Ranked #9 on Speech Synthesis on LibriTTS
no code implementations • 9 Jul 2019 • Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping
In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i. e., text-to-wave), to generate high-fidelity speech from multiple speakers.
2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
no code implementations • ICLR 2018 • Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi
We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR).