1 code implementation • 12 Mar 2024 • Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro
Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.
Ranked #27 on Visual Question Answering on MM-Vet
1 code implementation • 17 Feb 2024 • Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro
Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks.
Ranked #35 on Visual Question Answering on MM-Vet