Search Results for author: Fanyi Pu

Found 3 papers, 2 papers with code

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

no code implementations • 6 May 2024 • Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu

Multimodal information, together with our knowledge, help us to understand the complex and dynamic world.

Multiple-choice Video Understanding +1

Paper
Add Code

OtterHD: A High-Resolution Multi-modality Model

1 code implementation • 7 Nov 2023 • Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu

In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.

Ranked #86 on Visual Question Answering on MM-Vet

Visual Question Answering

3,463

Paper
Code

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

2 code implementations • 8 Jun 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu

We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.

Ranked #88 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

3,463

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.