Parallel Neural Text-to-Speech

ICLR 2020 · Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao ·

In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and obtains 46.7 times speed-up over Deep Voice 3 at synthesis while maintaining comparable speech quality using a WaveNet vocoder. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a layer-by-layer manner. Based on ParaNet, we build the first fully parallel neural text-to-speech system using parallel neural vocoders, which can synthesize speech from text through a single feed-forward pass. We investigate several parallel vocoders within the TTS system, including variants of IAF vocoders and bipartite flow vocoder.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Convolution • Deep Voice 3 • Dense Connections • Dilated Causal Convolution • Dropout • DV3 Attention Block • DV3 Convolution Block • GLU • L1 Regularization • Leaky ReLU • LSTM • Mixture of Logistic Distributions • ParaNet • ParaNet Convolution Block • ReLU • Residual Connection • Scaled Dot-Product Attention • Seq2Seq • Sigmoid Activation • Softmax • Softsign Activation • Tanh Activation • Test • WaveNet • Weight Normalization

Edit Social Preview

Parallel Neural Text-to-Speech

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove