Search Results for author: Cha Zhang

Found 24 papers, 12 papers with code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

document understanding

Paper
Add Code

Kosmos-2.5: A Multimodal Literate Model

no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.

Reading Comprehension Text Generation

Paper
Add Code

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

no code implementations • 23 May 2023 • Li Sun, Florian Luisier, Kayhan Batmanghelich, Dinei Florencio, Cha Zhang

Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens.

Language Modelling Natural Language Understanding

Paper
Add Code

Diffusion-based Document Layout Generation

no code implementations • 19 Mar 2023 • Liu He, Yijuan Lu, John Corring, Dinei Florencio, Cha Zhang

Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets.

Paper
Add Code

Unifying Vision, Text, and Layout for Universal Document Processing

2 code implementations • CVPR 2023 • Zineng Tang, ZiYi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Ranked #5 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)

document understanding Image Reconstruction +1

1,645

Paper
Code

XDoc: Unified Pre-training for Cross-Format Document Understanding

1 code implementation • 6 Oct 2022 • Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

The surge of pre-training has witnessed the rapid development of document understanding recently.

Ranked #7 on Semantic entity labeling on FUNSD

document understanding Semantic entity labeling

18,825

Paper
Code

Understanding Long Documents with Different Position-Aware Attentions

no code implementations • 17 Aug 2022 • Hai Pham, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input.

document understanding Position

Paper
Add Code

DiT: Self-supervised Pre-training for Document Image Transformer

3 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Ranked #1 on Table Detection on ICDAR 2019

Document AI Document Image Classification +4

127,132

Paper
Code

Improving Structured Text Recognition with Regular Expression Biasing

no code implementations • 10 Nov 2021 • Baoguang Shi, WenFeng Cheng, Yijuan Lu, Cha Zhang, Dinei Florencio

We study the problem of recognizing structured text, i. e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing.

Decoder

Paper
Add Code

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

Text recognition is a long-standing research problem for document digitalization.

Ranked #3 on Handwritten Text Recognition on IAM

Handwritten Text Recognition Language Modelling +4

127,132

Paper
Code

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Ranked #13 on Document Image Classification on RVL-CDIP

Document Image Classification document understanding

127,132

Paper
Code

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

5 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Ranked #1 on Key Information Extraction on SROIE

Document Image Classification Document Layout Analysis +6

127,132

Paper
Code

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Caption Generation Language Modelling +5

Paper
Code

Multimodal active speaker detection and virtual cinematography for video conferencing

no code implementations • 10 Feb 2020 • Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video.

4k BIG-bench Machine Learning

Paper
Add Code

Renofeation: A Simple Transfer Learning Method for Improved Adversarial Robustness

1 code implementation • 7 Feb 2020 • Ting-Wu Chin, Cha Zhang, Diana Marculescu

Fine-tuning through knowledge transfer from a pre-trained model on a large-scale dataset is a widely spread approach to effectively build models on small-scale datasets.

Adversarial Attack Adversarial Robustness +1

Paper
Code

Towards Efficient Model Compression via Learned Global Ranking

1 code implementation • CVPR 2020 • Ting-Wu Chin, Ruizhou Ding, Cha Zhang, Diana Marculescu

First, both the accuracy and the speed of ConvNets can affect the performance of the application.

Model Compression

112

Paper
Code

RePr: Improved Training of Convolutional Filters

1 code implementation • CVPR 2019 • Aaditya Prakash, James Storer, Dinei Florencio, Cha Zhang

We show that by temporarily pruning and then restoring a subset of the model's filters, and repeating this process cyclically, overlap in the learned features is reduced, producing improved generalization.

Paper
Code

Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks

1 code implementation • 1 Oct 2018 • Ting-Wu Chin, Cha Zhang, Diana Marculescu

Resource-efficient convolution neural networks enable not only the intelligence on edge devices but also opportunities in system-level optimization such as scheduling.

Meta-Learning Scheduling

112

Paper
Code

Orthogonal and Idempotent Transformations for Learning Deep Neural Networks

no code implementations • 19 Jul 2017 • Jingdong Wang, Yajie Xing, Kexin Zhang, Cha Zhang

Identity transformations, used as skip-connections in residual networks, directly connect convolutional layers close to the input and those close to the output in deep neural networks, improving information flow and thus easing the training.

Paper
Add Code

Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution

7 code implementations • 3 Aug 2016 • Emad Barsoum, Cha Zhang, Cristian Canton Ferrer, Zhengyou Zhang

Crowd sourcing has become a widely adopted scheme to collect ground truth labels.

Facial Expression Recognition Facial Expression Recognition (FER) +1

587

Paper
Code

Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps

no code implementations • 25 Feb 2014 • Pengfei Wan, Gene Cheung, Philip A. Chou, Dinei Florencio, Cha Zhang, Oscar C. Au

In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm.

Quantization

Paper
Add Code

Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction

no code implementations • CVPR 2013 • Mao Ye, Cha Zhang, Ruigang Yang

With the wide-spread of consumer 3D-TV technology, stereoscopic videoconferencing systems are emerging.

General Classification Video Enhancement

Paper
Add Code

Wide-Baseline Hair Capture Using Strand-Based Refinement

no code implementations • CVPR 2013 • Linjie Luo, Cha Zhang, Zhengyou Zhang, Szymon Rusinkiewicz

We propose a novel algorithm to reconstruct the 3D geometry of human hairs in wide-baseline setups using strand-based refinement.

Paper
Add Code

Multiple-Instance Pruning For Learning Efficient Cascade Detectors

no code implementations • NeurIPS 2007 • Cha Zhang, Paul A. Viola

Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection.

Face Detection Multiple Instance Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.