1 code implementation • 20 Oct 2023 • Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music.
no code implementations • 16 Oct 2023 • Jie Tang, Bin He, Junkai Xu, Tian Tan, Zhipeng Wang, Yanmin Zhou, Shuo Jiang
The proposed method simplifies fall detection data acquisition experiments, provides novel venue for generating low cost synthetic data in scenario where acquiring data for machine learning is challenging and paves the way for customizing machine learning configurations.
2 code implementations • 9 Oct 2023 • Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs.
no code implementations • 25 Sep 2023 • Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 14 Sep 2023 • Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen
In spite of the excellent strides made by end-to-end (E2E) models in speech recognition in recent years, named entity recognition is still challenging but critical for semantic understanding.
no code implementations • 26 Apr 2023 • Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen
Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates.
no code implementations • 30 Oct 2021 • Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen
Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning.
no code implementations • 1 Jan 2021 • Haichuan Gao, Zhile Yang, Tian Tan, Feng Chen
Unfortunately, applying traditional Bellman updates to value function learning can be problematic for learning undiscounted return, and thus not suitable for optimizing success rate.
no code implementations • 31 Jul 2020 • Qi Liu, Tian Tan, Kai Yu
It is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM.
1 code implementation • NeurIPS 2020 • Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen
In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint.
1 code implementation • 23 Dec 2019 • Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla
We use an indexed value function to represent uncertainty in our action-value estimates.