no code implementations • 16 Apr 2024 • Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov
Using these datasets, we demonstrate that our proposed metrics achieve a stronger agreement with the ground-truth diversity than baselines.
no code implementations • 6 Feb 2024 • Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli
MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards.
no code implementations • 21 Aug 2023 • Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey
The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank
AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.
1 code implementation • 16 May 2023 • Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
no code implementations • 13 Mar 2023 • Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J. N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner
We developed dysarthric speech intelligibility classifiers on 551, 176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale.
no code implementations • 10 Feb 2023 • David W. Romero, Neil Zeghidour
We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation.
no code implementations • 30 Jan 2023 • Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel
We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice.
3 code implementations • 26 Jan 2023 • Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank
We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff".
Ranked #8 on Text-to-Music Generation on MusicCaps
5 code implementations • 7 Sep 2022 • Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
1 code implementation • 11 Jun 2022 • Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel
An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes.
no code implementations • 29 Mar 2022 • Sarthak Yadav, Neil Zeghidour
Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work.
no code implementations • 29 Mar 2022 • Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi
We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec.
3 code implementations • 15 Feb 2022 • Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression.
Ranked #35 on Language Modelling on WikiText-103
1 code implementation • ICLR 2022 • Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour
In particular, we show that introducing our layer into a ResNet-18 architecture allows keeping consistent high performance on CIFAR10, CIFAR100 and ImageNet even when training starts from poor random stride configurations.
5 code implementations • 7 Jul 2021 • Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi
We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs.
no code implementations • 28 May 2021 • Neil Zeghidour, Olivier Teboul, David Grangier
Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each speaker conditioned on the extracted representations.
no code implementations • 17 Mar 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour
Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion.
4 code implementations • 21 Jan 2021 • Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification.
no code implementations • ICLR 2021 • Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
Mel-filterbanks are fixed, engineered audio features which emulate human perception and have lived through the history of audio understanding up to today.
no code implementations • 1 Jan 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour
In particular, we also improve music understanding by reordering spectrogram patches in the frequency space, as well as video classification by reordering frames along the time axis.
2 code implementations • 21 Oct 2020 • Aaqib Saeed, David Grangier, Neil Zeghidour
We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio.
Ranked #4 on Spoken Command Recognition on Speech Command v2
no code implementations • 21 Oct 2020 • Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour
We propose CHARM, a method for training a single neural network across inconsistent input channels.
no code implementations • 20 Feb 2020 • Neil Zeghidour, David Grangier
Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation.
Ranked #6 on Speech Separation on WHAMR!
no code implementations • 30 May 2019 • Gabriel Dulac-Arnold, Neil Zeghidour, Marco Cuturi, Lucas Beyer, Jean-Philippe Vert
We propose a learning algorithm capable of learning from label proportions instead of direct data labels.
no code implementations • 17 Dec 2018 • Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert
In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.
Ranked #3 on Speech Recognition on WSJ eval93
no code implementations • 9 Dec 2018 • Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve
In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.
3 code implementations • 27 Nov 2018 • Juliette Millet, Neil Zeghidour
We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture.
1 code implementation • NeurIPS 2018 • Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach
On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
1 code implementation • 19 Jun 2018 • Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.
2 code implementations • 30 Apr 2018 • Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux
We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.
no code implementations • NeurIPS 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.
2 code implementations • 3 Nov 2017 • Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux
We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition.
3 code implementations • 1 Jun 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.
no code implementations • 23 Apr 2017 • Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux
Recent works have explored deep architectures for learning multimodal speech representation (e. g. audio and images, articulation and audio) in a supervised way.