Search Results for author: Weijia Wu

Found 30 papers, 19 papers with code

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

1 code implementation • 30 Apr 2024 • Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.

Domain Generalization Text Spotting

Paper
Code

DragAnything: Motion Control for Anything using Entity Representation

2 code implementations • 12 Mar 2024 • Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

300

Paper
Code

Towards Accurate Post-training Quantization for Reparameterized Models

1 code implementation • 25 Feb 2024 • Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, YangWei Ying, Hong Zhou

Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance.

Quantization

Paper
Code

ControlCap: Controllable Region-level Captioning

1 code implementation • 31 Jan 2024 • Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue.

Ranked #1 on Dense Captioning on Visual Genome

Dense Captioning

Paper
Code

Continual Learning for Image Segmentation with Dynamic Query

1 code implementation • 29 Nov 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou

Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required to incorporate new classes continually.

Continual Learning Image Segmentation +5

Paper
Code

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation • 24 Nov 2023 • Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

Paper
Code

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

1 code implementation • 12 Oct 2023 • Rui Zhao, YuChao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

703

Paper
Code

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

no code implementations • 7 Oct 2023 • Luoming Zhang, Wen Fei, Weijia Wu, Yefei He, Zhenyu Lou, Hong Zhou

Fine-grained quantization has smaller quantization loss, consequently achieving superior performance.

Quantization

Paper
Add Code

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

1 code implementation • 5 Oct 2023 • Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.

Denoising Image Generation +1

Paper
Code

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Decoder Depth Estimation +6

283

Paper
Code

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation • ICCV 2023 • Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

Paper
Code

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

2 code implementations • NeurIPS 2023 • YuChao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community.

Attribute

367

Paper
Code

A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension

1 code implementation • 5 May 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i. e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video.

Reading Comprehension Retrieval +2

Paper
Code

FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation

1 code implementation • 5 May 2023 • Yuzhong Zhao, Weijia Wu, Zhuang Li, Jiahong Li, Weiqiang Wang

This paper introduces a novel video text synthesis technique called FlowText, which utilizes optical flow estimation to synthesize a large amount of text video data at a low cost for training robust video text spotters.

Optical Flow Estimation Text Spotting

Paper
Code

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations • 10 Apr 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Task 2 Text Detection +2

Paper
Add Code

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

1 code implementation • ICCV 2023 • Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen

In contrast, synthetic data can be freely available using a generative model (e. g., DALL-E, Stable Diffusion).

Image Generation Semantic Segmentation

Paper
Code

POSGen: Personalized Opening Sentence Generation for Online Insurance Sales

no code implementations • 10 Feb 2023 • Yu Li, Yi Zhang, Weijia Wu, Zimu Zhou, Qiang Li

Such personalized opening sentence generation is challenging because (i) there are limited historical samples for conversation topic recommendation in online insurance sales and (ii) existing text generation schemes often fail to support customized topic ordering based on user preferences.

Chatbot Management +2

Paper
Add Code

BiViT: Extremely Compressed Binary Vision Transformers

no code implementations • ICCV 2023 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

Paper
Add Code

BiViT: Extremely Compressed Binary Vision Transformer

no code implementations • 14 Nov 2022 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.

Binarization object-detection +1

Paper
Add Code

Explore Faster Localization Learning For Scene Text Detection

no code implementations • 4 Jul 2022 • Yuzhong Zhao, Yuanqiang Cai, Weijia Wu, Weiqiang Wang

Generally pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks.

Scene Text Detection Text Detection

Paper
Add Code

Binarizing by Classification: Is soft function really necessary?

no code implementations • 16 May 2022 • Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou

Extensive experiments demonstrate that the proposed method yields surprising performance both in image classification and human pose estimation tasks.

Ranked #1 on Binarization on ImageNet (Top 1 Accuracy metric)

Binarization Binary Classification +3

Paper
Add Code

Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization

no code implementations • 8 Apr 2022 • Yefei He, Luoming Zhang, Weijia Wu, Hong Zhou

In this paper, we present a simple yet effective data-free quantization method with accurate activation clipping and adaptive batch normalization.

Data Free Quantization

Paper
Add Code

End-to-End Video Text Spotting with Transformer

1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

Paper
Code

Contrastive Learning of Semantic and Visual Representations for Text Tracking

1 code implementation • 30 Dec 2021 • Zhuang Li, Weijia Wu, Mike Zheng Shou, Jiahong Li, Size Li, Zhongyuan Wang, Hong Zhou

Semantic representation is of great benefit to the video text tracking(VTT) task that requires simultaneously classifying, detecting, and tracking texts in the video.

Contrastive Learning

Paper
Code

A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer

3 code implementations • 9 Dec 2021 • Weijia Wu, Yuanqiang Cai, Debing Zhang, Sibo Wang, Zhuang Li, Jiahong Li, Yejun Tang, Hong Zhou

Most existing video text spotting benchmarks focus on evaluating a single language and scenario with limited data.

text annotation Text Spotting

Paper
Code

EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling

no code implementations • 10 Sep 2021 • Jue Wang, Haofan Wang, Jincan Deng, Weijia Wu, Debing Zhang

Extra rich non-paired single-modal text data is used for boosting the generalization of text branch.

Cross-Modal Retrieval Language Modelling +4

Paper
Add Code

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo

For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.

Scene Text Detection Text Detection

Paper
Code

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

1 code implementation • 3 Sep 2020 • Weijia Wu, Ning Lu, Enze Xie

To address the severe domain distribution mismatch, we propose a synthetic-to-real domain adaptation method for scene text detection, which transfers knowledge from synthetic data (source domain) to real data (target domain).

Adversarial Text Scene Text Detection +2

Paper
Code

Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems

no code implementations • 18 Nov 2019 • Qiang Huang, Jianhui Bu, Weijian Xie, Shengwen Yang, Weijia Wu, Li-Ping Liu

Sentence matching is an essential task in the QA systems and is usually reformulated as a Paraphrase Identification (PI) problem.

Ranked #13 on Paraphrase Identification on Quora Question Pairs (Accuracy metric)

intent-classification Intent Classification +6

Paper
Add Code

TextCohesion: Detecting Text for Arbitrary Shapes

no code implementations • 22 Apr 2019 • Weijia Wu, Jici Xing, Hong Zhou

In this paper, we propose a pixel-wise method named TextCohesion for scene text detection, which splits a text instance into five key components: a Text Skeleton and four Directional Pixel Regions.

Ranked #1 on Curved Text Detection on SCUT-CTW1500

Curved Text Detection Text Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.