1 code implementation • 14 Dec 2023 • Kangwook Jang, Sungnyun Kim, Hoirin Kim
Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize.
1 code implementation • 19 May 2023 • Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim
Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks.
no code implementations • 26 Oct 2022 • Myunghun Jung, Hoirin Kim
Many recent loss functions in deep metric learning are expressed with logarithmic and exponential forms, and they involve margin and scale as essential hyper-parameters.
1 code implementation • 1 Jul 2022 • Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim
Our method reduces the model to 23. 8% in size and 35. 9% in inference time compared to HuBERT.
no code implementations • 4 Apr 2022 • Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim
Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.
no code implementations • 30 Mar 2022 • Myunghun Jung, Hoirin Kim
Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words.
no code implementations • 1 Jan 2021 • Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples, or confidence-weighted average of all the query samples.
no code implementations • 2 Nov 2020 • Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim
Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation.
no code implementations • 6 Oct 2020 • Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim
At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments.
no code implementations • 9 Aug 2020 • Yeunju Choi, Youngmoon Jung, Hoirin Kim
While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants.
no code implementations • 16 Jul 2020 • Yeunju Choi, Youngmoon Jung, Hoirin Kim
In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC).
1 code implementation • Interspeech 2020 • Sunghee Jung, Hoirin Kim
To deal with this issue, we propose two models, hard and soft pitchtron and release the toolkit and corpus that we have developed.
no code implementations • 8 May 2020 • Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim
Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary.
no code implementations • 7 Apr 2020 • Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim
In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor.
1 code implementation • 6 Apr 2020 • Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim
By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets.
1 code implementation • 27 Mar 2020 • Joohyung Lee, Youngmoon Jung, Hoirin Kim
The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.
1 code implementation • 27 Feb 2020 • Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang
To tackle this issue, we propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries such that they improve the model's transductive inference performance on unseen tasks.
no code implementations • 1 Oct 2019 • Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim
Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection.
no code implementations • 26 Sep 2019 • Youngmoon Jung, Yeunju Choi, Hoirin Kim
The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor.
no code implementations • 19 Jun 2019 • Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim
Furthermore, we apply deep length normalization by augmenting the loss function with ring loss.
no code implementations • 7 Nov 2018 • Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim
Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network.