1 code implementation • 18 Mar 2024 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua
We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding.
Referring Video Object Segmentation Semantic Segmentation +2
2 code implementations • 7 Jun 2023 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua
The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged.
1 code implementation • 21 Nov 2022 • Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu
The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.
1 code implementation • ICCV 2021 • Zixin Zhu, Wei Tang, Le Wang, Nanning Zheng, Gang Hua
We explore two existing models to be the P-Net in our experiments.