Search Results for author: Jiaming Zhou

Found 10 papers, 3 papers with code

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

no code implementations6 Jun 2024 Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference.

CKGConv: General Graph Convolution with Continuous Kernels

1 code implementation21 Apr 2024 Liheng Ma, Soumyasundar Pal, Yitian Zhang, Jiaming Zhou, Yingxue Zhang, Mark Coates

In this work, we propose a novel and general graph convolution framework by parameterizing the kernels as continuous functions of pseudo-coordinates derived via graph positional encoding.

Graph Classification Graph Learning +1

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

1 code implementation3 Mar 2024 Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng

To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.

Open Vocabulary Action Recognition

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

no code implementations22 Jan 2024 Jiaming Zhou, Junwei Liang, Kun-Yu Lin, Jinrui Yang, Wei-Shi Zheng

With the proposed ActionHub dataset, we further propose a novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which consists of a Dual Cross-modality Alignment module and a Cross-action Invariance Mining module.

Action Recognition Video Description +1

GeoDeformer: Geometric Deformable Transformer for Action Recognition

no code implementations29 Nov 2023 Jinhui Ye, Jiaming Zhou, Hui Xiong, Junwei Liang

Specifically, at the core of GeoDeformer is the Geometric Deformation Predictor, a module designed to identify and quantify potential spatial and temporal geometric deformations within the given video.

Action Recognition

Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition

no code implementations28 Nov 2023 Jiaming Zhou, Hanjun Li, Kun-Yu Lin, Junwei Liang

To this end, this work aims to build a weakly supervised end-to-end framework for training recognition models on long videos, with only video-level action category labels.

Action Classification Action Recognition +6

PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting

1 code implementation4 Oct 2023 Yujin Tang, Jiaming Zhou, Xiang Pan, Zeying Gong, Junwei Liang

To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting.

NWP Post-processing Precipitation Forecasting

CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

no code implementations26 Jul 2023 Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Jiaming Zhou, Baoxiang Li

RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence.

Automatic Speech Recognition speech-recognition +1

MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

no code implementations22 Feb 2023 Jiaming Zhou, Shiwan Zhao, Ning Jiang, Guoqing Zhao, Yong Qin

Unsupervised domain adaptation (UDA) aims to improve the performance on the unlabeled target domain by transferring knowledge from the source to the target domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Graph-Based High-Order Relation Modeling for Long-Term Action Recognition

no code implementations CVPR 2021 Jiaming Zhou, Kun-Yu Lin, Haoxin Li, Wei-Shi Zheng

In this paper, we propose a Graph-based High-order Relation Modeling (GHRM) module to exploit the high-order relations in the long-term actions for long-term action recognition.

Action Recognition Long-video Activity Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.