no code implementations • 6 May 2024 • Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu
Multimodal information, together with our knowledge, help us to understand the complex and dynamic world.
1 code implementation • 7 Nov 2023 • Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu
In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.
Ranked #86 on Visual Question Answering on MM-Vet
2 code implementations • 8 Jun 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu
We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.
Ranked #88 on Visual Question Answering on MM-Vet