1 code implementation • 11 Dec 2023 • Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
Multimodal Large Language Models (MLLMs) demonstrate exceptional problem-solving capabilities, but there is limited research focusing on their ability to generate data by converting unlabeled images into visual instruction tuning data.
2 code implementations • 15 Sep 2023 • Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
Experiments on 19 visual transfer learning downstream tasks demonstrate that our SCT outperforms full fine-tuning on 18 out of 19 tasks by adding only 0. 11M parameters of the ViT-B, which is 780$\times$ fewer than its full fine-tuning counterpart.