1 code implementation • 29 Apr 2024 • Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou
By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.
1 code implementation • 21 Feb 2024 • Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian
In our solution, a deep learning model was first trained to retarget the facial expression from input face images to virtual human faces by estimating the blendshape coefficients.
2 code implementations • 2 Feb 2024 • Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou
Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language.
no code implementations • 20 Dec 2023 • Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou
Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity.
no code implementations • ICCV 2023 • Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.
1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
1 code implementation • ICCV 2021 • Zechen Bai, Yuta Nakashima, Noa Garcia
Have you ever looked at a painting and wondered what is the story behind it?
1 code implementation • CVPR 2021 • Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, Errui Ding
Although achieving great success, most of them only use limited data from a single-source domain for model pre-training, making the rich labeled data insufficiently exploited.
no code implementations • 15 Jan 2020 • Li Wang, Zechen Bai, Yonghua Zhang, Hongtao Lu
SG and RWS are de-signed for the best use of recalled words.