Search Results for author: Sriram Ganapathy

Found 52 papers, 20 papers with code

Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization

no code implementations • 23 Jan 2024 • Prachi Singh, Sriram Ganapathy

Speaker diarization, the task of segmenting an audio recording based on speaker identity, constitutes an important speech pre-processing step for several downstream applications.

Clustering Graph Clustering +4

Paper
Add Code

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement

1 code implementation • 9 Jan 2024 • Soumya Dutta, Sriram Ganapathy

The problem of audio-to-audio (A2A) style transfer involves replacing the style features of the source audio with those from the target audio while preserving the content related attributes of the source audio.

Decoder Disentanglement +1

Paper
Code

LLM Augmented LLMs: Expanding Capabilities through Composition

1 code implementation • 4 Jan 2024 • Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains.

Arithmetic Reasoning Code Generation

138

Paper
Code

Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and LAnguage in Conversational Environments

no code implementations • 21 Nov 2023 • Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy

In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages.

speaker-diarization Speaker Diarization

Paper
Add Code

Self-Influence Guided Data Reweighting for Language Model Pre-training

no code implementations • 2 Nov 2023 • Megh Thakkar, Tolga Bolukbasi, Sriram Ganapathy, Shikhar Vashishth, Sarath Chandar, Partha Talukdar

Once the pre-training corpus has been assembled, all data samples in the corpus are treated with equal importance during LM pre-training.

Language Modelling

Paper
Add Code

Accented Speech Recognition With Accent-specific Codebooks

1 code implementation • 24 Oct 2023 • Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni

In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.

Accented Speech Recognition Automatic Speech Recognition +2

Paper
Code

Speech enhancement with frequency domain auto-regressive modeling

no code implementations • 24 Sep 2023 • Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram Ganapathy

The dereverberated envelope-carrier signals are modulated and the sub-band signals are synthesized to reconstruct the audio signal back.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multimodal Modeling For Spoken Language Identification

no code implementations • 19 Sep 2023 • Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.

Language Identification Spoken language identification

Paper
Add Code

MASR: Multi-label Aware Speech Representation

no code implementations • 20 Jul 2023 • Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth

In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording.

Emotion Recognition Language Identification +4

Paper
Add Code

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

no code implementations • 14 Jul 2023 • Varun Krishna, Tarun Sai, Sriram Ganapathy

The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Enhancing the EEG Speech Match Mismatch Tasks With Word Boundaries

1 code implementation • 1 Jul 2023 • Akshara Soman, Vidhi Sinha, Sriram Ganapathy

Recent studies have shown that the underlying neural mechanisms of human speech comprehension can be analyzed using a match-mismatch classification of the speech stimulus and the neural response.

EEG Sentence

Paper
Code

Label Aware Speech Representation Learning For Language Identification

no code implementations • 7 Jun 2023 • Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.

Language Identification Missing Labels +3

Paper
Add Code

Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

1 code implementation • 22 May 2023 • Debarpan Bhattacharya, Neeraj Kumar Sharma, Debottam Dutta, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

The rich metadata contained demographic information associated with age, gender and geographic location, as well as the health information relating to the symptoms, pre-existing respiratory ailments, comorbidity and SARS-CoV-2 test status.

Fairness

173

Paper
Code

HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

no code implementations • 14 Apr 2023 • Soumya Dutta, Sriram Ganapathy

The audio and text representations are processed using a set of bi-directional recurrent neural network layers with self-attention that converts each utterance in a given conversation to a fixed dimensional embedding.

Ranked #1 on Multimodal Emotion Recognition on MELD

Emotion Classification Emotion Recognition in Conversation +1

Paper
Add Code

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

no code implementations • 1 Mar 2023 • Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy

The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing.

speaker-diarization Speaker Diarization

Paper
Add Code

Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization

no code implementations • 24 Feb 2023 • Prachi Singh, Amrit Kaul, Sriram Ganapathy

We also propose an approach to jointly update the embedding extractor and the GNN model to perform end-to-end speaker diarization (E2E-SHARC).

Clustering Graph Clustering +3

Paper
Add Code

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

no code implementations • 26 Aug 2022 • Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

In this paper, we propose a model to perform style transfer of speech to singing voice.

Data Augmentation Style Transfer

Paper
Add Code

Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection

1 code implementation • 27 Jun 2022 • Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir H. Poorjam, Deepak Mittal, Maneesh Singh

In this paper, we describe an approach for representation learning of audio signals for the task of COVID-19 detection.

Representation Learning Transfer Learning

Paper
Code

Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals

no code implementations • 24 Jun 2022 • Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

The COVID-19 outbreak resulted in multiple waves of infections that have been associated with different SARS-CoV-2 variants.

COVID-19 Diagnosis Specificity

Paper
Add Code

Svadhyaya system for the Second Diagnosing COVID-19 using Acoustics Challenge 2021

no code implementations • 11 Jun 2022 • Deepak Mittal, Amir H. Poorjam, Debottam Dutta, Debarpan Bhattacharya, Zemin Yu, Sriram Ganapathy, Maneesh Singh

This report describes the system used for detecting COVID-19 positives using three different acoustic modalities, namely speech, breathing, and cough in the second DiCOVA challenge.

Paper
Add Code

Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms

1 code implementation • 9 Jun 2022 • Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

The COVID-19 pandemic has accelerated research on design of alternative, quick and effective COVID-19 diagnosis approaches.

COVID-19 Diagnosis

173

Paper
Code

The Second DiCOVA Challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics

no code implementations • 4 Oct 2021 • Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy

This paper presents the details of the challenge, which was an open call for researchers to analyze a dataset of audio recordings consisting of breathing, cough and speech signals.

COVID-19 Diagnosis

Paper
Add Code

Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

1 code implementation • 14 Sep 2021 • Prachi Singh, Sriram Ganapathy

In this paper, we propose an approach that jointly learns the speaker embeddings and the similarity metric using principles of self-supervised learning.

Clustering Graph Clustering +5

Paper
Code

Dereverberation of Autoregressive Envelopes for Far-field Speech Recognition

no code implementations • 12 Aug 2021 • Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy

The dereverberated envelopes are used for feature extraction in speech recognition.

Speech Dereverberation speech-recognition +1

Paper
Add Code

End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes

1 code implementation • 9 Aug 2021 • Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

A Multi-Head Relevance Weighting Framework For Learning Raw Waveform Audio Representations

no code implementations • 30 Jul 2021 • Debottam Dutta, Purvi Agrawal, Sriram Ganapathy

The relevance weighted representations are fed to a neural classifier and the whole system is trained jointly for the audio classification objective.

Audio Classification Sound Classification

Paper
Add Code

SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing

no code implementations • 24 Jun 2021 • R G Prithvi Raj, Rohit Kumar, M K Jayesh, Anurenjan Purushothaman, Sriram Ganapathy, M A Basha Shaik

This paper presents the details of the SRIB-LEAP submission to the ConferencingSpeech challenge 2021.

Speech Enhancement

Paper
Add Code

Towards sound based testing of COVID-19 -- Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge

no code implementations • 21 Jun 2021 • Neeraj Kumar Sharma, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy

As part of the challenge, datasets with breathing, cough, and speech sound samples from COVID-19 and non-COVID-19 individuals were released to the participants.

Paper
Add Code

Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

1 code implementation • 1 Jun 2021 • Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic.

Paper
Code

Deep Correlation Analysis for Audio-EEG Decoding

no code implementations • 18 May 2021 • Jaswanth Reddy Katthi, Sriram Ganapathy

A deep model is proposed for intra-subject audio-EEG analysis based on directly optimizing the correlation loss.

EEG Eeg Decoding

Paper
Add Code

Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization

1 code implementation • 19 Apr 2021 • Prachi Singh, Sriram Ganapathy

In this paper, we propose a representation learning and clustering algorithm that can be iteratively performed for improved speaker diarization.

Clustering Representation Learning +3

Paper
Code

LEAP Submission for the Third DIHARD Diarization Challenge

no code implementations • 6 Apr 2021 • Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj Chetupalli, Sriram Ganapathy

This paper describes the challenge submission, the post-evaluation analysis and improvements observed on the DIHARD-III dataset.

Clustering speaker-diarization +1

Paper
Add Code

Speaker conditioned acoustic modeling for multi-speaker conversational ASR

no code implementations • 5 Apr 2021 • Srikanth Raj Chetupalli, Sriram Ganapathy

The proposed model is a combination of a speaker diarization system and a hybrid automatic speech recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

no code implementations • 16 Mar 2021 • Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning.

COVID-19 Diagnosis

Paper
Add Code

Deep Multiway Canonical Correlation Analysis for Multi-Subject EEG Normalization

no code implementations • 11 Mar 2021 • Jaswanth Reddy Katthi, Sriram Ganapathy

The experiments are performed on EEG data collected from subjects listening to natural speech and music.

EEG

Paper
Add Code

End-to-end lyrics Recognition with Voice to Singing Style Transfer

1 code implementation • 17 Feb 2021 • Sakya Basak, Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice.

Data Augmentation Language Modelling +2

Paper
Code

The Third DIHARD Diarization Challenge

3 code implementations • 2 Dec 2020 • Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.

speaker-diarization Speaker Diarization +1

5,307

Paper
Code

Neural PLDA Modeling for End-to-End Speaker Verification

1 code implementation • 11 Aug 2020 • Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy

Recently, we had proposed a neural network approach for backend modeling in speaker verification called the neural PLDA (NPLDA) where the likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost.

Speaker Recognition Speaker Verification

Paper
Code

Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

1 code implementation • 10 Aug 2020 • Prachi Singh, Sriram Ganapathy

In this paper, we propose a novel algorithm for hierarchical clustering which combines the speaker clustering along with a representation learning framework.

Audio and Speech Processing

Paper
Code

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

no code implementations • 7 Aug 2020 • Anurenjan Purushothaman, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy

Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

1 code implementation • 12 Jul 2020 • Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy, Ragesh Rajan M, Prashant Krishnan

The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected.

Speaker Profiling

Paper
Code

Towards Relevance and Sequence Modeling in Language Recognition

no code implementations • 2 Apr 2020 • Bharat Padi, Anand Mohan, Sriram Ganapathy

In particular, a new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task.

Language Identification Speaker Recognition

Paper
Add Code

NPLDA: A Deep Neural PLDA Model for Speaker Verification

1 code implementation • 10 Feb 2020 • Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy

The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost.

Speaker Recognition Speaker Verification

Paper
Code

LEAP System for SRE19 CTS Challenge -- Improvements and Error Analysis

no code implementations • 7 Feb 2020 • Shreyas Ramoji, Prashant Krishnan, Bhargavram Mysore, Prachi Singh, Sriram Ganapathy

In this paper, we provide a detailed account of the LEAP SRE system submitted to the CTS challenge focusing on the novel components in the back-end system modeling.

Speaker Recognition Speaker Verification

Paper
Add Code

Pairwise Discriminative Neural PLDA for Speaker Verification

1 code implementation • 20 Jan 2020 • Shreyas Ramoji, Prashant Krishnan V, Prachi Singh, Sriram Ganapathy

The pre-processing steps of linear discriminant analysis (LDA), unit length normalization and within class covariance normalization are all modeled as layers of a neural model and the speaker verification cost functions can be back-propagated through these layers during training.

Speaker Verification

Paper
Code

Improving Voice Separation by Incorporating End-to-end Speech Recognition

1 code implementation • 29 Nov 2019 • Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji

Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR

no code implementations • 28 Nov 2019 • Rohit Kumar, Anirudh Sreeram, Anurenjan Purushothaman, Sriram Ganapathy

These models are trained using a paired corpus of clean and noisy recordings (teacher model).

Paper
Add Code

3-D Feature and Acoustic Modeling for Far-Field Speech Recognition

no code implementations • 13 Nov 2019 • Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy

The MAR features are fed to a convolutional neural network (CNN) architecture which performs the joint acoustic modeling on the three dimensions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The Second DIHARD Diarization Challenge: Dataset, task, and baselines

1 code implementation • 18 Jun 2019 • Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Mark Liberman

This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain.

Action Detection Activity Detection +5

Paper
Code

Leveraging Native Language Speech for Accent Identification using Deep Siamese Networks

no code implementations • 25 Dec 2017 • Aditya Siddhant, Preethi Jyothi, Sriram Ganapathy

The problem of automatic accent identification is important for several applications like speaker profiling and recognition as well as for improving speech recognition systems.

Speaker Profiling speech-recognition +1

Paper
Add Code

The IBM Speaker Recognition System: Recent Advances and Error Analysis

no code implementations • 5 May 2016 • Seyed Omid Sadjadi, Jason Pelecanos, Sriram Ganapathy

We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

The IBM 2016 Speaker Recognition System

no code implementations • 23 Feb 2016 • Seyed Omid Sadjadi, Sriram Ganapathy, Jason W. Pelecanos

In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech.

2k Automatic Speech Recognition +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.