Search Results for author: Ashish Seth

Found 12 papers, 7 papers with code

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

1 code implementation20 Dec 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

1 code implementation20 Dec 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online.

Domain Adaptation Self-Supervised Learning

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

no code implementations12 Oct 2023 Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.

Attribute Audio Classification +1

DeAR: Debiasing Vision-Language Models with Additive Residuals

no code implementations CVPR 2023 Ashish Seth, Mayur Hemani, Chirag Agarwal

These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications.

Attribute Benchmarking +2

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

1 code implementation10 Mar 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.

Audio Classification Self-Supervised Learning

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

1 code implementation2 Nov 2022 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification.

Audio Classification Clustering +3

MAST: Multiscale Audio Spectrogram Transformers

1 code implementation2 Nov 2022 Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha

We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST).

Audio Classification Keyword Spotting +1

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

no code implementations1 Nov 2022 Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya, S Umesh, Rajeev Sangal

Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video.

Chunking Speech Synthesis +1

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

no code implementations31 Mar 2022 Ashish Seth, Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

1 code implementation25 Mar 2022 Sreyan Ghosh, Ashish Seth, and Deepak Mittal, Maneesh Singh, S. Umesh

Inspired by the recent progress in self-supervised learning for computer vision, in this paper we introduce DeLoRes, a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning +1

DECAR: Deep Clustering for learning general-purpose Audio Representations

1 code implementation17 Oct 2021 Sreyan Ghosh, Sandesh V Katta, Ashish Seth, S. Umesh

We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations.

Clustering Deep Clustering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.