no code implementations • 7 May 2024 • Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang
Leveraging the proposed view attention as well as an additional multi-frame streaming temporal attention, we introduce ViewFormer, a vision-centric transformer-based framework for spatiotemporal feature aggregation.
no code implementations • 5 Jan 2024 • Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem.
no code implementations • CVPR 2022 • Jinke Li, Xiao He, Yang Wen, Yuan Gao, Xiaoqiang Cheng, Dan Zhang
As a rising task, panoptic segmentation is faced with challenges in both semantic segmentation and instance segmentation.