no code implementations • 18 Oct 2023 • Siyu An, Ye Liu, Haoyuan Peng, Di Yin
Extracting structured information from videos is critical for numerous downstream applications in the industry.
no code implementations • CVPR 2023 • Ye Liu, Lingfeng Qiao, Changchong Lu, Di Yin, Chen Lin, Haoyuan Peng, Bo Ren
An intuitive way to handle these two problems is to fulfill these tasks in two separate stages: aligning modalities followed by domain adaptation, or vice versa.
no code implementations • 14 Nov 2022 • Lingfeng Qiao, Chen Wu, Ye Liu, Haoyuan Peng, Di Yin, Bo Ren
In this paper, we propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.