no code implementations • 11 Dec 2023 • Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
Instruction tuning data is essential for training the Multimodal Large Language Models (MLLMs).
2 code implementations • 15 Sep 2023 • Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
Experiments on 19 visual transfer learning downstream tasks demonstrate that our SCT outperforms full fine-tuning on 18 out of 19 tasks by adding only 0. 11M parameters of the ViT-B, which is 780$\times$ fewer than its full fine-tuning counterpart.