no code implementations • 5 Jun 2023 • Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen
The standard Gaussian Process (GP) only considers a single output sample per input in the training set.
no code implementations • 23 Sep 2021 • Jeremy H. M. Wong, Yifan Gong
Speakers may move around while diarisation is being performed.
no code implementations • 22 Sep 2021 • Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong
Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task.
no code implementations • 17 Mar 2020 • Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong
While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1