1 code implementation • 16 Feb 2024 • Shengzhi Li, Rongyu Lin, Shichao Pei
In conclusion, we propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset that reconciles the textual and visual performance of MLLMs, restoring and boosting language capability after visual instruction tuning.