Search Results for author: Hoirin Kim

Found 21 papers, 7 papers with code

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models

1 code implementation • 14 Dec 2023 • Kangwook Jang, Sungnyun Kim, Hoirin Kim

Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize.

Relation Self-Supervised Learning

Paper
Code

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

1 code implementation • 19 May 2023 • Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim

Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks.

Self-Supervised Learning

Paper
Code

AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination

no code implementations • 26 Oct 2022 • Myunghun Jung, Hoirin Kim

Many recent loss functions in deep metric learning are expressed with logarithmic and exponential forms, and they involve margin and scale as essential hyper-parameters.

Metric Learning

Paper
Add Code

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

1 code implementation • 1 Jul 2022 • Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Our method reduces the model to 23. 8% in size and 35. 9% in inference time compared to HuBERT.

Knowledge Distillation Self-Supervised Learning

Paper
Code

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

no code implementations • 4 Apr 2022 • Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim

Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.

Speaker Verification Transfer Learning +1

Paper
Add Code

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

no code implementations • 30 Mar 2022 • Myunghun Jung, Hoirin Kim

Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words.

Metric Learning MULTI-VIEW LEARNING +1

Paper
Add Code

Meta-Learned Confidence for Transductive Few-shot Learning

no code implementations • 1 Jan 2021 • Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang

A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples, or confidence-weighted average of all the query samples.

Few-Shot Learning

Paper
Add Code

Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

no code implementations • 2 Nov 2020 • Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim

Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation.

Knowledge Distillation Speech Synthesis +1

Paper
Add Code

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

no code implementations • 6 Oct 2020 • Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim

At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments.

Action Detection Activity Detection +2

Paper
Add Code

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

no code implementations • 9 Aug 2020 • Yeunju Choi, Youngmoon Jung, Hoirin Kim

While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants.

Speech Synthesis Voice Conversion

Paper
Add Code

Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification

no code implementations • 16 Jul 2020 • Yeunju Choi, Youngmoon Jung, Hoirin Kim

In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC).

Multi-Task Learning Voice Conversion

Paper
Add Code

Pitchtron: Towards audiobook generation from ordinary people's voices

1 code implementation • Interspeech 2020 • Sunghee Jung, Hoirin Kim

To deal with this issue, we propose two models, hard and soft pitchtron and release the toolkit and corpus that we have developed.

Decoder

156

Paper
Code

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

no code implementations • 8 May 2020 • Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim

Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary.

Action Detection Activity Detection +2

Paper
Add Code

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

no code implementations • 7 Apr 2020 • Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim

In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor.

Text-Independent Speaker Verification

Paper
Add Code

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

1 code implementation • 6 Apr 2020 • Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets.

Meta-Learning Speaker Identification +2

Paper
Code

Dual Attention in Time and Frequency Domain for Voice Activity Detection

1 code implementation • 27 Mar 2020 • Joohyung Lee, Youngmoon Jung, Hoirin Kim

The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.

Action Detection Activity Detection

Paper
Code

Meta-Learned Confidence for Few-shot Learning

1 code implementation • 27 Feb 2020 • Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang

To tackle this issue, we propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries such that they improve the model's transductive inference performance on unseen tasks.

Ranked #2 on Few-Shot Image Classification on Mini-ImageNet - 1-Shot Learning

Few-Shot Image Classification Few-Shot Learning

Paper
Code

Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

no code implementations • 1 Oct 2019 • Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection.

Decoder speech-recognition +2

Paper
Add Code

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

no code implementations • 26 Sep 2019 • Youngmoon Jung, Yeunju Choi, Hoirin Kim

The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor.

Action Detection Activity Detection +2

Paper
Add Code

Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification

no code implementations • 19 Jun 2019 • Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim

Furthermore, we apply deep length normalization by augmenting the loss function with ring loss.

Text-Independent Speaker Verification

Paper
Add Code

Learning acoustic word embeddings with phonetically associated triplet network

no code implementations • 7 Nov 2018 • Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim

Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network.

Word Embeddings

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.