Multi-band MelGAN

Introduced by Yang et al. in Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

Multi-band MelGAN, or MB-MelGAN, is a waveform generation model focusing on high-quality text-to-speech. It improves the original MelGAN in several ways. First, it increases the receptive field of the generator, which is proven to be beneficial to speech generation. Second, it substitutes the feature matching loss with the multi-resolution STFT loss to better measure the difference between fake and real speech. Lastly, MelGAN is extended with multi-band processing: the generator takes mel-spectrograms as input and produces sub-band signals which are subsequently summed back to full-band signals as discriminator input.

Source: Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Generative Audio Models

Multi-band MelGAN

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove