1 code implementation • 2 Jun 2024 • Timothy Ossowski, Junjie Hu
Recent generalist vision-language models (VLMs) have demonstrated impressive reasoning capabilities across diverse multimodal tasks.
1 code implementation • 4 Apr 2024 • Harmon Bhasin, Timothy Ossowski, Yiqiao Zhong, Junjie Hu
Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text, also known as in-context learning (ICL).
1 code implementation • 20 Jan 2024 • Timothy Ossowski, Ming Jiang, Junjie Hu
Vision-language models such as CLIP have shown impressive capabilities in encoding texts and images into aligned embeddings, enabling the retrieval of multimodal data in a shared embedding space.
Ranked #22 on Visual Reasoning on Winoground
1 code implementation • 30 Jun 2023 • Timothy Ossowski, Junjie Hu
Recent years have witnessed impressive results of pre-trained vision-language models on knowledge-intensive tasks such as visual question answering (VQA).
1 code implementation • 23 May 2022 • Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei Ming, Junjie Hu, Dimitris Papailiopoulos, Kangwook Lee
Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods.