Search Results for author: Haowei Liu

Found 5 papers, 2 papers with code

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

no code implementations • 1 Mar 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.

Representation Learning

Paper
Add Code

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

no code implementations • 26 Feb 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.

Retrieval Text Retrieval +1

Paper
Add Code

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

1 code implementation • 22 Feb 2024 • Zicheng Lin, Zhibin Gou, Tian Liang, Ruilin Luo, Haowei Liu, Yujiu Yang

Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i. e., GQC reasoning.

Benchmarking

Paper
Code

NFT1000: A Visual Text Dataset For Non-Fungible Token Retrieval

no code implementations • 29 Jan 2024 • Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Wenjuan Li, Bing Li, Weiming Hu

With the rise of 'Metaverse' and 'Web3. 0', NFT ( Non-Fungible Token ) has emerged as a kind of pivotal digital asset, garnering significant attention.

Retrieval

Paper
Add Code

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2 code implementations • 7 Nov 2023 • Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.

Ranked #11 on Visual Question Answering (VQA) on InfiMM-Eval

Decoder Language Modelling +2

1,958

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.