1 code implementation • 13 Feb 2024 • Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin
A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use.
2 code implementations • 4 Oct 2023 • Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Ye Wang
Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum.
1 code implementation • 5 Aug 2023 • Xiangming Gu, Wei Zeng, Ye Wang
Leveraging the prior knowledge that pitch distributions may contribute to the gender bias, we propose conditionally aligning acoustic representations between demographic groups by feeding note events to the attribute predictor.
1 code implementation • 20 Jul 2022 • Longshen Ou, Xiangming Gu, Ye Wang
To fill in the performance gap between ALT and ASR, we attempt to exploit the similarities between speech and singing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 13 Jul 2022 • Xiangming Gu, Longshen Ou, Danielle Ong, Ye Wang
Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest from both the speech and music information retrieval communities, given its significant application potential.
no code implementations • 10 Oct 2020 • Xiangming Gu, Xiang Cheng
Deep neural networks (DNNs) demonstrate great success in classification tasks.