no code implementations • 9 May 2024 • Zuan Gao, Yuxin Wang, Yadong Qu, Boqiang Zhang, Zixiao Wang, Jianjun Xu, Hongtao Xie
At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context.
no code implementations • 7 May 2024 • Boqiang Zhang, Hongtao Xie, Zuan Gao, Yuxin Wang
Based on the dataset, we decouple the two types of features by the supervision design.
1 code implementation • 8 Oct 2023 • Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang
In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP.
1 code implementation • 9 May 2023 • Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Yongdong Zhang
Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task.