1 code implementation • 29 Mar 2024 • Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li
In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.
1 code implementation • 8 Feb 2024 • Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.
Ranked #5 on Video Question Answering on MVBench
no code implementations • 9 Oct 2023 • Weifeng Lin, Ziheng Wu, Jiayu Chen, Wentao Yang, Mingxin Huang, Jun Huang, Lianwen Jin
Fine-tuning pre-trained Vision Transformers (ViT) has consistently demonstrated promising performance in the realm of visual recognition.
1 code implementation • ICCV 2023 • Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, Lianwen Jin
Specifically, SMT with 11. 5M / 2. 4GFLOPs and 32M / 7. 7GFLOPs can achieve 82. 2% and 84. 3% top-1 accuracy on ImageNet-1K, respectively.