Search Results for author: Neil Zeghidour

Found 35 papers, 15 papers with code

MAD Speech: Measures of Acoustic Diversity of Speech

no code implementations • 16 Apr 2024 • Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov

Using these datasets, we demonstrate that our proposed metrics achieve a stronger agreement with the ground-truth diversity than baselines.

Paper
Add Code

MusicRL: Aligning Music Generation to Human Preferences

no code implementations • 6 Feb 2024 • Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli

MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards.

Music Generation

Paper
Add Code

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

no code implementations • 21 Aug 2023 • Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

AudioPaLM: A Large Language Model That Can Speak and Listen

no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank

AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.

Language Modelling Large Language Model +5

Paper
Add Code

SoundStorm: Efficient Parallel Audio Generation

1 code implementation • 16 May 2023 • Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi

We present SoundStorm, a model for efficient, non-autoregressive audio generation.

Audio Generation

1,123

Paper
Code

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

no code implementations • 13 Mar 2023 • Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J. N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner

We developed dysarthric speech intelligibility classifiers on 551, 176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale.

Paper
Add Code

DNArch: Learning Convolutional Neural Architectures by Backpropagation

no code implementations • 10 Feb 2023 • David W. Romero, Neil Zeghidour

We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation.

Paper
Add Code

SingSong: Generating musical accompaniments from singing

no code implementations • 30 Jan 2023 • Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice.

Audio Generation Retrieval

Paper
Add Code

MusicLM: Generating Music From Text

3 code implementations • 26 Jan 2023 • Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank

We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff".

Ranked #8 on Text-to-Music Generation on MusicCaps

Music Generation Text-to-Music Generation

19,742

Paper
Code

AudioLM: a Language Modeling Approach to Audio Generation

5 code implementations • 7 Sep 2022 • Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

Audio Generation Language Modelling

32,805

Paper
Code

Multi-instrument Music Synthesis with Spectrogram Diffusion

1 code implementation • 11 Jun 2022 • Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes.

Decoder Generative Adversarial Network +1

370

Paper
Code

Learning neural audio features without supervision

no code implementations • 29 Mar 2022 • Sarthak Yadav, Neil Zeghidour

Deep audio classification, traditionally cast as training a deep neural network on top of mel-filterbanks in a supervised fashion, has recently benefited from two independent lines of work.

Audio Classification Self-Supervised Learning

Paper
Add Code

Disentangling speech from surroundings with neural embeddings

no code implementations • 29 Mar 2022 • Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi

We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec.

Attribute

Paper
Add Code

General-purpose, long-context autoregressive modeling with Perceiver AR

3 code implementations • 15 Feb 2022 • Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel

Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression.

Ranked #35 on Language Modelling on WikiText-103

Density Estimation Language Modelling

407

Paper
Code

Learning strides in convolutional neural networks

1 code implementation • ICLR 2022 • Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

In particular, we show that introducing our layer into a ResNet-18 architecture allows keeping consistent high performance on CIFAR10, CIFAR100 and ImageNet even when training starts from poor random stride configurations.

Image Classification

122

Paper
Code

SoundStream: An End-to-End Neural Audio Codec

5 code implementations • 7 Jul 2021 • Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs.

Decoder Speech Enhancement

3,740

Paper
Code

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

no code implementations • 28 May 2021 • Neil Zeghidour, Olivier Teboul, David Grangier

Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each speaker conditioned on the extracted representations.

speaker-diarization Speaker Diarization

Paper
Add Code

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

no code implementations • 17 Mar 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion.

Classification General Classification +1

Paper
Add Code

LEAF: A Learnable Frontend for Audio Classification

4 code implementations • 21 Jan 2021 • Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification.

Audio Classification General Classification

475

Paper
Code

A Universal Learnable Audio Frontend

no code implementations • ICLR 2021 • Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have lived through the history of audio understanding up to today.

Audio Classification

Paper
Add Code

Shuffle to Learn: Self-supervised learning from permutations via differentiable ranking

no code implementations • 1 Jan 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

In particular, we also improve music understanding by reordering spectrogram patches in the frequency space, as well as video classification by reordering frames along the time axis.

General Classification Self-Supervised Learning +1

Paper
Add Code

Contrastive Learning of General-Purpose Audio Representations

2 code implementations • 21 Oct 2020 • Aaqib Saeed, David Grangier, Neil Zeghidour

We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio.

Ranked #4 on Spoken Command Recognition on Speech Command v2

CoLA Contrastive Learning +2

32,929

Paper
Code

Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

no code implementations • 21 Oct 2020 • Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

We propose CHARM, a method for training a single neural network across inconsistent input channels.

EEG

Paper
Add Code

Wavesplit: End-to-End Speech Separation by Speaker Clustering

no code implementations • 20 Feb 2020 • Neil Zeghidour, David Grangier

Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation.

Ranked #6 on Speech Separation on WHAMR!

Clustering Data Augmentation +1

Paper
Add Code

Deep multi-class learning from label proportions

no code implementations • 30 May 2019 • Gabriel Dulac-Arnold, Neil Zeghidour, Marco Cuturi, Lucas Beyer, Jean-Philippe Vert

We propose a learning algorithm capable of learning from label proportions instead of direct data labels.

Binary Classification General Classification +1

Paper
Add Code

Fully Convolutional Speech Recognition

no code implementations • 17 Dec 2018 • Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert

In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.

Ranked #3 on Speech Recognition on WSJ eval93

Language Modelling speech-recognition +1

Paper
Add Code

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

no code implementations • 9 Dec 2018 • Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.

Multi-Task Learning Speaker Recognition +2

Paper
Add Code

Learning to detect dysarthria from raw speech

3 code implementations • 27 Nov 2018 • Juliette Millet, Neil Zeghidour

We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture.

General Classification Sentence +2

Paper
Code

SING: Symbol-to-Instrument Neural Generator

1 code implementation • NeurIPS 2018 • Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

Audio Synthesis Decoder +1

158

Paper
Code

End-to-End Speech Recognition From the Raw Waveform

1 code implementation • 19 Jun 2018 • Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.

speech-recognition Speech Recognition

Paper
Code

Sampling strategies in Siamese Networks for unsupervised speech representation learning

2 code implementations • 30 Apr 2018 • Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux

We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.

Representation Learning

Paper
Code

Fader Networks:Manipulating Images by Sliding Attributes

no code implementations • NeurIPS 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space.

Attribute Decoder

Paper
Add Code

Learning Filterbanks from Raw Speech for Phone Recognition

2 code implementations • 3 Nov 2017 • Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux

We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition.

475

Paper
Code

Fader Networks: Manipulating Images by Sliding Attributes

3 code implementations • 1 Jun 2017 • Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

Attribute Decoder

757

Paper
Code

Learning weakly supervised multimodal phoneme embeddings

no code implementations • 23 Apr 2017 • Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux

Recent works have explored deep architectures for learning multimodal speech representation (e. g. audio and images, articulation and audio) in a supervised way.

Multi-Task Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.