no code implementations • 17 Apr 2024 • Wenbo Zhang, Yifan Zhang, Jianfeng Lin, Binqiang Huang, Jinlu Zhang, Wenhao Yu
Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks.
no code implementations • 12 Mar 2024 • Pan Ting, Jianfeng Lin, Wenhao Yu, Wenlong Zhang, Xiaoying Chen, Jinlu Zhang, Binqiang Huang
Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis.