1 code implementation • 12 Mar 2020 • Jing-Yun Xiao, Shuang Yang, Yuan-Hang Zhang, Shiguang Shan, Xilin Chen
Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.
Ranked #6 on Lipreading on CAS-VSR-W1k (LRW-1000)
1 code implementation • 6 Mar 2020 • Yuan-Hang Zhang, Shuang Yang, Jing-Yun Xiao, Shiguang Shan, Xilin Chen
Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).
Ranked #2 on Lipreading on GRID corpus (mixed-speech)
2 code implementations • 16 Oct 2018 • Shuang Yang, Yuan-Hang Zhang, Dalu Feng, Mingmin Yang, Chenhao Wang, Jing-Yun Xiao, Keyu Long, Shiguang Shan, Xilin Chen
It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.
Ranked #2 on Lipreading on LRW-1000
no code implementations • 15 Oct 2018 • Jing-Yun Xiao
Motivated by two problems existing in lipreading, words with similar pronunciation and the variation of word duration, we propose a novel 3D Feature Pyramid Attention (3D-FPA) module to jointly improve the representation power of features in both the spatial and temporal domains.
no code implementations • 12 Jan 2017 • Yuan-Hang Zhang, Xie Li, Jing-Yun Xiao
In this paper, we propose an edge detector architecture for color images based on fuzzy theory and the Sobel operator.