no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.
no code implementations • 10 Aug 2023 • Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).
2 code implementations • NeurIPS 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez
We tackle the task of conditional music generation.
Ranked #4 on Text-to-Music Generation on MusicCaps
1 code implementation • NeurIPS 2023 • Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
no code implementations • 30 Sep 2022 • Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi
This work focuses on improving the robustness of discrete input representations for generative spoken language modeling.
1 code implementation • 30 Sep 2022 • Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi
Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.
Ranked #12 on Audio Generation on AudioCaps
1 code implementation • 22 Jun 2022 • Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi
By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 Apr 2022 • Yehoshua Dissen, Felix Kreuk, Joseph Keshet
Specifically, the study focuses on generating high-quality neural speaker representations without any annotated data, as well as on estimating secondary hyperparameters of the model without annotations.
no code implementations • 7 Apr 2022 • Talia Ben-Simon, Felix Kreuk, Faten Awwad, Jacob T. Cohen, Joseph Keshet
Grownup learners of a language tweak their speech to match the tutor reference.
no code implementations • 14 Nov 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi
We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.
no code implementations • arXiv 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
2 code implementations • 27 Jul 2020 • Felix Kreuk, Joseph Keshet, Yossi Adi
Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets.
1 code implementation • NeurIPS 2020 • Yuval Atzmon, Felix Kreuk, Uri Shalit, Gal Chechik
This leads to consistent misclassification of samples from a new distribution, like new combinations of known components.
1 code implementation • 11 Feb 2020 • Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi
Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc.
1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet
Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.
no code implementations • 13 Feb 2018 • Felix Kreuk, Assi Barak, Shir Aviv-Reuven, Moran Baruch, Benny Pinkas, Joseph Keshet
Deep learning models have been successfully applied to malware detection.
no code implementations • 10 Jan 2018 • Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet
We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system that was trained on NTIMIT; and when the adversarial examples were generated with a system that was trained on Mel-spectrum feature set, but the attack is on a system that was trained on MFCC.