1 code implementation • 14 Mar 2024 • Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida
Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks.
no code implementations • 21 Nov 2023 • Trung Dang, Jasper C. H. Lee, Maoyuan Song, Paul Valiant
The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0, 1)$.
1 code implementation • 19 Sep 2023 • Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi
Diffusion models power a vast majority of text-to-audio (TTA) generation methods.
Ranked #10 on Audio Generation on AudioCaps
no code implementations • 28 Sep 2022 • Chau Pham, Trung Dang, Peter Chin
Persistence diagrams (PDs), often characterized as sets of death and birth of homology class, have been known for providing a topological representation of a graph structure, which is often useful in machine learning tasks.
no code implementations • 9 Jul 2022 • Trung Dang, Simon Kornblith, Huy Thong Nguyen, Peter Chin, Maryam Khademi
In this work, we study different approaches to self-supervised pretraining of object detection models.
no code implementations • 8 Dec 2021 • Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida
Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.
1 code implementation • NeurIPS 2021 • Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays
Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (e. g., ResNet), or they can be reconstructed jointly with model inputs by using Gradients Matching [Zhu et al'19] with additional knowledge about the current state of the model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 15 Apr 2021 • Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays
We show that a dropout rate of 0. 2 can reduce the speaker identity accuracy to 0% top-1 (0. 5% top-5).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2