Search Results for author: Yupan Huang

Found 12 papers, 6 papers with code

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

no code implementations • 28 Nov 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.

Language Modelling Large Language Model +1

Paper
Add Code

Kosmos-2.5: A Multimodal Literate Model

no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.

Reading Comprehension Text Generation

Paper
Add Code

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

1 code implementation • 31 Aug 2023 • Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu

Our experiments validate the effectiveness of SparklesChat in understanding and reasoning across multiple images and dialogue turns.

Instruction Following Visual Reasoning

Paper
Code

TextDiffuser: Diffusion Models as Text Painters

no code implementations • NeurIPS 2023 • Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.

Optical Character Recognition (OCR)

Paper
Add Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

2 code implementations • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Ranked #1 on Key Information Extraction on EPHOIE

Document AI Document Image Classification +10

125,385

Paper
Code

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

1 code implementation • 19 Oct 2021 • Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images.

Paper
Code

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

1 code implementation • 19 Oct 2021 • Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

We adopt Transformer as our unified architecture for its strong performance and task-agnostic design.

Text Generation Text-to-Image Generation

Paper
Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +5

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +3

Paper
Add Code

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

3 code implementations • CVPR 2021 • Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Ranked #5 on Visual Entailment on SNLI-VE val

Representation Learning Retrieval +3

206

Paper
Code

Reinforcing Short-Length Hashing

no code implementations • 24 Apr 2020 • Xingbo Liu, Xiushan Nie, Qi Dai, Yupan Huang, Yilong Yin

Due to the compelling efficiency in retrieval and storage, similarity-preserving hashing has been widely applied to approximate nearest neighbor search in large-scale image retrieval.

Image Retrieval Retrieval

Paper
Add Code

Decoupling Localization and Classification in Single Shot Temporal Action Detection

1 code implementation • 16 Apr 2019 • Yupan Huang, Qi Dai, Yutong Lu

Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream.

Ranked #26 on Temporal Action Localization on THUMOS’14

Action Detection Classification +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.