Methods > Audio > Generative Audio Models

MelGAN is a non-autoregressive feed-forward convolutional architecture to perform audio waveform generation in a GAN setup. The architecture is a fully convolutional feed-forward network with mel-spectrogram $s$ as input and raw waveform $x$ as output. Since the mel-spectrogram is at a 256× lower temporal resolution, the authors use a stack of transposed convolutional layers to upsample the input sequence. Each transposed convolutional layer is followed by a stack of residual blocks with dilated convolutions. Unlike traditional GANs, the MelGAN generator does not use a global noise vector as input.

To deal with 'checkerboard artifacts' in audio, instead of using PhaseShuffle, MelGAN uses kernel-size as a multiple of stride.

Weight normalization is used for normalization. A window-based discriminator, similar to a PatchGAN is used for the discriminator.

Source: MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Latest Papers

PAPER DATE
Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN
Congyi WangYu ChenBin WangYi Shi
2021-03-26
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
| Won JangDan LimJaesam Yoon
2020-11-19
StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
| Ahmed MustafaNicola PiaGuillaume Fuchs
2020-11-03
SpeedySpeech: Efficient Neural Speech Synthesis
| Jan VainerOndřej Dušek
2020-08-09
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
| Jinhyeok YangJun-Mo LeeYoungik KimHoon-Young ChoInjung Kim
2020-07-30
Adversarial representation learning for private speech generation
| David EricssonAdam ÖstbergEdvin Listo ZecJohn MartinssonOlof Mogren
2020-06-16
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
Luka ChkhetianiLevan Bejanidze
2020-06-13
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
| Kundan KumarRithesh KumarThibault de BoissiereLucas GestinWei Zhen TeohJose SoteloAlexandre de BrebissonYoshua BengioAaron Courville
2019-10-08

Tasks

TASK PAPERS SHARE
Speech Synthesis 4 80.00%
Speech Enhancement 1 20.00%

Categories