Search Results for author: Xinghua Jiang

Found 6 papers, 1 papers with code

HRVDA: High-Resolution Visual Document Assistant

no code implementations • 10 Apr 2024 • Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu

In addition, we construct a document-oriented visual instruction tuning dataset and apply a multi-stage training strategy to enhance the model's document modeling capabilities.

document understanding

Paper
Add Code

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

no code implementations • 29 Feb 2024 • Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

It can represent that the contrastive learning between the visual holistic representations and the multimodal fine-grained features of document objects can assist the vision encoder in acquiring more effective visual cues, thereby enhancing the comprehension of text-rich documents in LVLMs.

Contrastive Learning document understanding

Paper
Add Code

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

no code implementations • 14 Aug 2023 • Zhaohui Li, Haitao Wang, Xinghua Jiang

In our experiments, we treat discrete acoustic codes as textual data and train a masked language model using a cloze-like methodology, ultimately deriving high-quality audio representations.

Audio Classification Language Modelling +1

Paper
Add Code

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

no code implementations • 4 Jul 2022 • Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang Jiang, Bo Ren

In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category.

Scene Segmentation

Paper
Add Code

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

no code implementations • 18 Apr 2022 • Hao liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data.

Paper
Add Code

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

1 code implementation • CVPR 2022 • Hao liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang, Bo Ren

For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks.

object-detection Object Detection +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.