Search Results for author: Sameer Khurana

Found 21 papers, 4 papers with code

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

1 code implementation • 27 Feb 2024 • Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter.

Spatial Interpolation

Paper
Code

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

no code implementations • 12 Dec 2023 • Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux

Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity.

EEG

Paper
Add Code

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

no code implementations • 30 Oct 2023 • Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers.

Speaker Separation Speech Enhancement +1

Paper
Add Code

Generation or Replication: Auscultating Audio Latent Diffusion Models

no code implementations • 16 Oct 2023 • Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio.

AudioCaps Memorization +1

Paper
Add Code

Direct Text to Speech Translation System using Acoustic Units

no code implementations • 14 Sep 2023 • Victoria Mingote, Pablo Gimeno, Luis Vicente, Sameer Khurana, Antoine Laurent, Jarod Duret

This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language.

Decoder Speech-to-Speech Translation +2

Paper
Add Code

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

no code implementations • 1 Jun 2023 • Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass

Having a single model that supports multiple translation tasks is desirable.

Cross-Lingual Transfer Decoder +5

Paper
Add Code

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations • 21 May 2023 • Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

Paper
Add Code

On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration

no code implementations • 14 Nov 2022 • Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass

Dropout-based Uncertainty-driven Self-Training (DUST) proceeds by first training a teacher model on source domain labeled data.

Pseudo Label Pseudo Label Filtering +1

Paper
Add Code

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

no code implementations • 17 May 2022 • Sameer Khurana, Antoine Laurent, James Glass

We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.

Retrieval Sentence +5

Paper
Add Code

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

2 code implementations • 13 Mar 2022 • Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James Glass

Audio classification is an active research area with a wide range of applications.

Audio Classification Knowledge Distillation

1,021

Paper
Code

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

no code implementations • 7 Oct 2021 • Sameer Khurana, Antoine Laurent, James Glass

We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

no code implementations • 26 Nov 2020 • Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning

no code implementations • 4 Jun 2020 • Sameer Khurana, Antoine Laurent, James Glass

The audio encoder is trained to perform a speech-translation retrieval task in a contrastive learning framework.

BIG-bench Machine Learning Contrastive Learning +3

Paper
Add Code

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

no code implementations • 3 Jun 2020 • Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Robust Training of Vector Quantized Bottleneck Models

1 code implementation • 18 May 2020 • Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J. G. A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent

We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs.

Clustering Disentanglement +1

Paper
Code

DARTS: Dialectal Arabic Transcription System

no code implementations • 26 Sep 2019 • Sameer Khurana, Ahmed Ali, James Glass

We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.

Language Modelling Transfer Learning