no code implementations • 6 Jun 2024 • Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin
To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference.
1 code implementation • 21 Apr 2024 • Liheng Ma, Soumyasundar Pal, Yitian Zhang, Jiaming Zhou, Yingxue Zhang, Mark Coates
In this work, we propose a novel and general graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding.
1 code implementation • 3 Mar 2024 • Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng
To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.
no code implementations • 22 Jan 2024 • Jiaming Zhou, Junwei Liang, Kun-Yu Lin, Jinrui Yang, Wei-Shi Zheng
With the proposed ActionHub dataset, we further propose a novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which consists of a Dual Cross-modality Alignment module and a Cross-action Invariance Mining module.
no code implementations • 29 Nov 2023 • Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang
Specifically, at the core of GeoDeformer is the Geometric Deformation Predictor, a module designed to identify and quantify potential spatial and temporal geometric deformations within the given video.
no code implementations • 28 Nov 2023 • Jiaming Zhou, Hanjun Li, Kun-Yu Lin, Junwei Liang
To this end, this work aims to build a weakly supervised end-to-end framework for training recognition models on long videos, with only video-level action category labels.
Ranked #1 on Long-video Activity Recognition on Breakfast
1 code implementation • 4 Oct 2023 • Yujin Tang, Jiaming Zhou, Xiang Pan, Zeying Gong, Junwei Liang
To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting.
no code implementations • 26 Jul 2023 • Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Jiaming Zhou, Baoxiang Li
RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence.
no code implementations • 22 Feb 2023 • Jiaming Zhou, Shiwan Zhao, Ning Jiang, Guoqing Zhao, Yong Qin
Unsupervised domain adaptation (UDA) aims to improve the performance on the unlabeled target domain by transferring knowledge from the source to the target domain.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • CVPR 2021 • Jiaming Zhou, Kun-Yu Lin, Haoxin Li, Wei-Shi Zheng
In this paper, we propose a Graph-based High-order Relation Modeling (GHRM) module to exploit the high-order relations in the long-term actions for long-term action recognition.
Ranked #5 on Video Classification on Breakfast