Search Results for author: Houqiang Li

Found 190 papers, 84 papers with code

Progressive Multi-modal Conditional Prompt Tuning

no code implementations • 18 Apr 2024 • Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li

Initialization is responsible for encoding image and text using a VLM, followed by a feature filter that selects text features similar to image.

Paper
Add Code

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

1 code implementation • 15 Apr 2024 • Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li

The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.

Question Answering Visual Question Answering (VQA)

Paper
Code

Cross-Lingual Transfer for Natural Language Inference via Multilingual Prompt Translator

no code implementations • 19 Mar 2024 • Xiaoyu Qiu, Yuechen Wang, Jiaxin Shi, Wengang Zhou, Houqiang Li

To efficiently transfer soft prompt, we propose a novel framework, Multilingual Prompt Translator (MPT), where a multilingual prompt translator is introduced to properly process crucial knowledge embedded in prompt by changing language knowledge while retaining task knowledge.

Cross-Lingual Transfer Natural Language Inference

Paper
Add Code

Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction

no code implementations • 18 Mar 2024 • Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li

To address the above problem, we propose a novel motion-aware enhancement framework for dynamic scene reconstruction, which mines useful motion cues from optical flow to improve different paradigms of dynamic 3DGS.

Optical Flow Estimation

Paper
Add Code

GaussNav: Gaussian Splatting for Visual Navigation

no code implementations • 18 Mar 2024 • Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li

In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.

Visual Navigation

Paper
Add Code

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

1 code implementation • 18 Mar 2024 • Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner.

Knowledge Distillation NER +1

Paper
Code

Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval

no code implementations • 3 Mar 2024 • Yongchao Du, Min Wang, Wengang Zhou, Shuping Hui, Houqiang Li

To tackle the above problems, we propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.

Image Retrieval Language Modelling +2

Paper
Add Code

Structure Similarity Preservation Learning for Asymmetric Image Retrieval

1 code implementation • 1 Mar 2024 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

The centroid vectors in the quantizer serve as anchor points in the embedding space of the gallery model to characterize its structure.

Image Retrieval Retrieval

Paper
Code

Asymmetric Feature Fusion for Image Retrieval

no code implementations • CVPR 2023 • Hui Wu, Min Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

Then, a dynamic mixer is introduced to aggregate these features into compact embedding for efficient search.

Image Retrieval Retrieval

Paper
Add Code

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation • 29 Feb 2024 • Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

Paper
Code

Sinkhorn Distance Minimization for Knowledge Distillation

1 code implementation • 27 Feb 2024 • Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li

We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions.

Knowledge Distillation

Paper
Code

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

no code implementations • 25 Feb 2024 • Xiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li

As a new embodied vision task, Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment.

Navigate

Paper
Add Code

Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression

no code implementations • 29 Jan 2024 • Xihua Sheng, Li Li, Dong Liu, Houqiang Li

With the SDD-based motion model and long short-term temporal contexts fusion, our proposed learned video codec can obtain more accurate inter prediction.

Motion Estimation MS-SSIM +2

Paper
Add Code

Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding

no code implementations • 15 Jan 2024 • Qi Sun, Xiao Cui, Wengang Zhou, Houqiang Li

In this study, we tackle the challenge of classifying the object category in point clouds, which previous works like PointCLIP struggle to address due to the inherent limitations of the CLIP architecture.

Point Cloud Classification Robust classification +1

Paper
Add Code

Passive Non-Line-of-Sight Imaging with Light Transport Modulation

no code implementations • 26 Dec 2023 • Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu

In this work, we propose NLOS-LTM, a novel passive NLOS imaging method that effectively handles multiple light transport conditions with a single network.

Paper
Add Code

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

2 code implementations • 21 Dec 2023 • Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen

Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our TinySAM against counterpart methods.

Knowledge Distillation Quantization

357

Paper
Code

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

1 code implementation • 5 Dec 2023 • Youpeng Zhao, Yudong Lu, Jian Zhao, Wengang Zhou, Houqiang Li

The utilization of artificial intelligence (AI) in card games has been a well-explored subject within AI research for an extensive period.

Card Games reinforcement-learning

Paper
Code

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

1 code implementation • 22 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li

Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.

document understanding Instruction Following +3

Paper
Code

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

no code implementations • 20 Nov 2023 • Hao Feng, Qi Liu, Hao liu, Wengang Zhou, Houqiang Li, Can Huang

This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.

document understanding Language Modelling +2

Paper
Add Code

PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

no code implementations • 8 Nov 2023 • Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li

Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID).

Person Re-Identification

Paper
Add Code

Progressive Recurrent Network for Shadow Removal

no code implementations • 1 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li

To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).

Image Shadow Removal Shadow Removal

Paper
Add Code

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

no code implementations • 24 Oct 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.

Contrastive Learning Representation Learning

Paper
Add Code

Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning

no code implementations • 18 Oct 2023 • Yufei Kuang, Xijun Li, Jie Wang, Fangzhou Zhu, Meng Lu, Zhihai Wang, Jia Zeng, Houqiang Li, Yongdong Zhang, Feng Wu

Specifically, we formulate the routine design task as a Markov decision process and propose an RL framework with adaptive action sequences to generate high-quality presolve routines efficiently.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

MSight: An Edge-Cloud Infrastructure-based Perception System for Connected Automated Vehicles

no code implementations • 8 Oct 2023 • Rusheng Zhang, Depu Meng, Shengyin Shen, Zhengxia Zou, Houqiang Li, Henry X. Liu

As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications.

Trajectory Prediction

Paper
Add Code

Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning

no code implementations • 7 Oct 2023 • Yuchen Yang, Houqiang Li, Yanfeng Wang, Yu Wang

In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.

Hallucination In-Context Learning +1

Paper
Add Code

Sign Language Translation with Iterative Prototype

no code implementations • ICCV 2023 • Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li

Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement.

Ranked #5 on Sign Language Translation on CSL-Daily

Sentence Sign Language Translation +1

Paper
Add Code

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

no code implementations • 19 Aug 2023 • Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang

However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.

Instruction Following Text Detection +1

Paper
Add Code

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations • ICCV 2023 • Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Paper
Add Code

Text-Only Training for Visual Storytelling

no code implementations • 17 Aug 2023 • Yuechen Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation.

Informativeness Visual Storytelling

Paper
Add Code

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

no code implementations • 16 Aug 2023 • Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan

Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.

Trajectory Modeling Video Generation

Paper
Add Code

Masked Motion Predictors are Strong 3D Action Representation Learners

1 code implementation • ICCV 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.

Ranked #5 on Skeleton Based Action Recognition on NTU RGB+D 120

motion prediction Skeleton Based Action Recognition

Paper
Code

Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection

1 code implementation • ICCV 2023 • Yufei Yin, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

These inaccurate high-scoring region proposals will mislead the training of subsequent refinement modules and thus hamper the detection performance.

Object object-detection +1

Paper
Code

Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB Video

no code implementations • 8 Aug 2023 • Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, Houqiang Li

Reconstructing interacting hands from monocular RGB data is a challenging task, as it involves many interfering factors, e. g. self- and mutual occlusion and similar textures.

Paper
Add Code

AltFreezing for More General Video Face Forgery Detection

1 code implementation • CVPR 2023 • Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li

In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection.

Data Augmentation

Paper
Code

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

no code implementations • 19 Jun 2023 • Xihua Sheng, Li Li, Dong Liu, Houqiang Li

Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms.

Motion Compensation Motion Estimation +2

Paper
Add Code

Exploring Effective Mask Sampling Modeling for Neural Image Compression

no code implementations • 9 Jun 2023 • Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage.

Image Compression Self-Supervised Learning

Paper
Add Code

MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning

1 code implementation • 3 Jun 2023 • Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li

Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings.

Contrastive Learning Multi-agent Reinforcement Learning +2

Paper
Code

Detect Any Shadow: Segment Anything for Video Shadow Detection

1 code implementation • 26 May 2023 • Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li

Segment anything model (SAM) has achieved great success in the field of natural image segmentation.

Image Segmentation Semantic Segmentation +1

Paper
Code

Hybrid and Collaborative Passage Reranking

1 code implementation • 16 May 2023 • Zongmeng Zhang, Wengang Zhou, Jiaxin Shi, Houqiang Li

In passage retrieval system, the initial passage retrieval results may be unsatisfactory, which can be refined by a reranking scheme.

Passage Retrieval Retrieval

Paper
Code

SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding

no code implementations • 8 May 2023 • Hezhen Hu, Weichao Zhao, Wengang Zhou, Houqiang Li

In our framework, the hand pose is regarded as a visual token, which is derived from an off-the-shelf detector.

Ranked #1 on Sign Language Recognition on WLASL

Self-Supervised Learning Sign Language Recognition +1

Paper
Add Code

O-GNN: Incorporating Ring Priors into Molecular Modeling

1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.

Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)

Graph Regression Molecular Property Prediction +3

Paper
Code

DocMAE: Document Image Rectification via Self-supervised Representation Learning

1 code implementation • 20 Apr 2023 • Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu

Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.

Representation Learning Self-Supervised Learning

Paper
Code

Deep Unrestricted Document Image Rectification

1 code implementation • 18 Apr 2023 • Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Ranked #1 on Local Distortion on DocUNet

Local Distortion

345

Paper
Code

Learning Transferable Pedestrian Representation from Multimodal Information Supervision

1 code implementation • 12 Apr 2023 • Liping Bao, Longhui Wei, Xiaoyu Qiu, Wengang Zhou, Houqiang Li, Qi Tian

Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet.

Ranked #2 on Unsupervised Person Re-Identification on DukeMTMC-reID

Attribute Contrastive Learning +3

Paper
Code

HandNeRF: Neural Radiance Fields for Animatable Interacting Hands

no code implementations • CVPR 2023 • Zhiyang Guo, Wengang Zhou, Min Wang, Li Li, Houqiang Li

We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands, enabling the rendering of photo-realistic images and videos for gesture animation from arbitrary views.

Paper
Add Code

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.

Video Generation

Paper
Add Code

Human Pose as Compositional Tokens

1 code implementation • CVPR 2023 • Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu

Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings.

Ranked #1 on Pose Estimation on MPII Human Pose

Pose Estimation

260

Paper
Code

Focus on Your Target: A Dual Teacher-Student Framework for Domain-adaptive Semantic Segmentation

no code implementations • ICCV 2023 • Xinyue Huo, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Currently, a popular UDA framework lies in self-training which endows the model with two-fold abilities: (i) learning reliable semantics from the labeled images in the source domain, and (ii) adapting to the target domain via generating pseudo labels on the unlabeled images.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Add Code

DIRE for Diffusion-Generated Image Detection

1 code implementation • ICCV 2023 • Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li

We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion model in their training data.

201

Paper
Code

ROCO: A Roundabout Traffic Conflict Dataset

1 code implementation • 1 Mar 2023 • Depu Meng, Owen Sayer, Rusheng Zhang, Shengyin Shen, Houqiang Li, Henry X. Liu

With the traffic conflict data collected, we discover that failure to yield to circulating vehicles when entering the roundabout is the largest contributing reason for traffic conflicts.

Traffic Accident Detection

Paper
Code

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

no code implementations • 10 Feb 2023 • Weichao Zhao, Hezhen Hu, Wengang Zhou, Jiaxin Shi, Houqiang Li

In this work, we are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition~(SLR) model.

Pseudo Label Sign Language Recognition

Paper
Add Code

Recurrent Generic Contour-based Instance Segmentation with Progressive Learning

1 code implementation • 21 Jan 2023 • Hao Feng, Keyi Zhou, Wengang Zhou, Yufei Yin, Jiajun Deng, Qi Sun, Houqiang Li

It maintains a single estimate of the contour that is progressively deformed toward the object boundary.

Ranked #1 on Semantic Contour Prediction on Sbd val

Instance Segmentation Lane Detection +6

Paper
Code

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

no code implementations • 13 Jan 2023 • Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang Li, Yanyong Zhang

To this end, we propose OA-BEV, a network that can be plugged into the BEV-based 3D object detection framework to bring out the objects by incorporating object-aware pseudo-3D features and depth features.

3D Object Detection Object +1

Paper
Add Code

Motion Information Propagation for Neural Video Compression

no code implementations • CVPR 2023 • Linfeng Qi, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

Meanwhile, besides assisting frame coding at the current time step, the feature from context generation will be propagated as motion condition when coding the subsequent motion latent.

Video Compression

Paper
Add Code

Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management

no code implementations • 15 Dec 2022 • Yuandong Ding, Mingxiao Feng, Guozi Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Houqiang Li, Yan Jin, Jiang Bian

In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand.

Management Multi-agent Reinforcement Learning +2

Paper
Add Code

Hand-Object Interaction Image Generation

no code implementations • 28 Nov 2022 • Hezhen Hu, Weilun Wang, Wengang Zhou, Houqiang Li

In this work, we are dedicated to a new task, i. e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status.

Image Generation Object

Paper
Add Code

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

no code implementations • 28 Nov 2022 • YiXuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li

The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network.

Attribute Image Generation +1

Paper
Add Code

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

1 code implementation • 22 Nov 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.

Ranked #1 on Image Generation on Places50

Denoising Image Generation +1

272

Paper
Code

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Paper
Add Code

DanZero: Mastering GuanDan Game with Reinforcement Learning

no code implementations • 31 Oct 2022 • Yudong Lu, Jian Zhao, Youpeng Zhao, Wengang Zhou, Houqiang Li

We compare it with 8 baseline AI programs which are based on heuristic rules and the results reveal the outstanding performance of DanZero.

Card Games reinforcement-learning +1

Paper
Add Code

Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding

no code implementations • Findings (EMNLP) 2021 • Yuechen Wang, Wengang Zhou, Houqiang Li

In this work, we propose a novel candidate-free framework: Fine-grained Semantic Alignment Network (FSAN), for weakly supervised TLG.

Sentence

Paper
Add Code

UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior

1 code implementation • 15 Oct 2022 • Yonghui Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li

To this end, we propose UDoc-GAN, the first framework to address the problem of document illumination correction under the unpaired setting.

Paper
Code

Geometric Representation Learning for Document Image Rectification

2 code implementations • 15 Oct 2022 • Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Paper
Code

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

and 2) how to mitigate the impact of these factors?

Ranked #2 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

437

Paper
Code

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

1 code implementation • 26 Aug 2022 • Yunyao Mao, Wengang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li

In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem.

3D Action Recognition Knowledge Distillation +1

Paper
Code

Low-Light Video Enhancement with Synthetic Event Guidance

no code implementations • 23 Aug 2022 • Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Low-light video enhancement (LLVE) is an important yet challenging task with many applications such as photographing and autonomous driving.

Autonomous Driving Image Enhancement +1

Paper
Add Code

Neighbor Correspondence Matching for Flow-based Video Frame Synthesis

no code implementations • 14 Jul 2022 • Zhaoyang Jia, Yan Lu, Houqiang Li

Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.

Ranked #2 on Video Frame Interpolation on X4K1000FPS

4k Video Compression +1

Paper
Add Code

Unified 2D and 3D Pre-Training of Molecular Representations

1 code implementation • 14 Jul 2022 • Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation.

Graph Generation Molecular Property Prediction +3

Paper
Code

Semantic Image Synthesis via Diffusion Models

3 code implementations • 30 Jun 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).

Denoising Image Generation

194

Paper
Code

Self-Adaptive Label Augmentation for Semi-supervised Few-shot Classification

no code implementations • 16 Jun 2022 • Xueliang Wang, Jianyu Cai, Shuiwang Ji, Houqiang Li, Feng Wu, Jie Wang

A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.

Classification

Paper
Add Code

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation • 14 Jun 2022 • Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

151

Paper
Code

Stabilizing Voltage in Power Distribution Networks via Multi-Agent Reinforcement Learning with Transformer

1 code implementation • 8 Jun 2022 • Minrui Wang, Mingxiao Feng, Wengang Zhou, Houqiang Li

Utilizing MARL algorithms to coordinate multiple control units in the grid, which is able to handle rapid changes of power systems, has been widely studied in active voltage control task recently.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Code

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

no code implementations • 23 May 2022 • Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan

To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.

Question Generation Question-Generation +1

Paper
Add Code

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

1 code implementation • 8 May 2022 • Qing Li, Wengang Zhou, Zhenbo Lu, Houqiang Li

Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive performance in continuous control tasks.

Continuous Control Q-Learning +1

Paper
Code

Multi-Target Active Object Tracking with Monte Carlo Tree Search and Target Motion Modeling

no code implementations • 7 May 2022 • Zheng Chen, Jian Zhao, Mingyu Yang, Wengang Zhou, Houqiang Li

In this work, we are dedicated to multi-target active object tracking (AOT), where there are multiple targets as well as multiple cameras in the environment.

Multi-agent Reinforcement Learning Object Tracking

Paper
Add Code

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

no code implementations • 5 May 2022 • Mingyu Yang, Jian Zhao, Xunhan Hu, Wengang Zhou, Jiangcheng Zhu, Houqiang Li

In this way, agents dealing with the same subtask share their learning of specific abilities and different subtasks correspond to different specific abilities.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Add Code

Estimation of Reliable Proposal Quality for Temporal Action Detection

1 code implementation • 25 Apr 2022 • Junshan Hu, Chaoxu Guo, Liansheng Zhuang, Biao Wang, Tiezheng Ge, Yuning Jiang, Houqiang Li

For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation containing more contextual information compared with point feature to refine category score and proposal boundary.

Action Detection

Paper
Code

Domain-Agnostic Prior for Transfer Semantic Segmentation

no code implementations • CVPR 2022 • Xinyue Huo, Lingxi Xie, Hengtong Hu, Wengang Zhou, Houqiang Li, Qi Tian

Unsupervised domain adaptation (UDA) is an important topic in the computer vision community.

Representation Learning Semantic Segmentation +1

Paper
Add Code

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

1 code implementation • 6 Apr 2022 • Youpeng Zhao, Jian Zhao, Xunhan Hu, Wengang Zhou, Houqiang Li

Recent years have witnessed the great breakthrough of deep reinforcement learning (DRL) in various perfect and imperfect information games.

Paper
Code

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Ranked #7 on Person Re-Identification on CUHK03

Contrastive Learning Multi-Object Tracking +3

217

Paper
Code

Learning Enriched Illuminants for Cross and Single Sensor Color Constancy

no code implementations • 21 Mar 2022 • Xiaodong Cun, Zhendong Wang, Chi-Man Pun, Jianzhuang Liu, Wengang Zhou, Xu Jia, Houqiang Li

Color constancy aims to restore the constant colors of a scene under different illuminants.

Color Constancy

Paper
Add Code

CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning

1 code implementation • 16 Mar 2022 • Jian Zhao, Xunhan Hu, Mingyu Yang, Wengang Zhou, Jiangcheng Zhu, Houqiang Li

In this way, CTDS balances the full utilization of global observation during training and the feasibility of decentralized execution for online inference.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents

1 code implementation • 16 Mar 2022 • Jian Zhao, Youpeng Zhao, Weixun Wang, Mingyu Yang, Xunhan Hu, Wengang Zhou, Jianye Hao, Houqiang Li

To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

TAPE: Task-Agnostic Prior Embedding for Image Restoration

no code implementations • 11 Mar 2022 • Lin Liu, Lingxi Xie, Xiaopeng Zhang, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Qi Tian

In this paper, we propose a novel approach that embeds a task-agnostic prior into a transformer.

Image Restoration

Paper
Add Code

MVP: Multimodality-guided Visual Pre-training

no code implementations • 10 Mar 2022 • Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Recently, masked image modeling (MIM) has become a promising direction for visual pre-training.

Language Modelling

Paper
Add Code

Coordinate-Aligned Multi-Camera Collaboration for Active Multi-Object Tracking

1 code implementation • 22 Feb 2022 • Zeyu Fang, Jian Zhao, Mingyu Yang, Wengang Zhou, Zhenbo Lu, Houqiang Li

In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution.

Multi-agent Reinforcement Learning Multi-Object Tracking

343

Paper
Code

MCMARL: Parameterizing Value Function via Mixture of Categorical Distributions for Multi-Agent Reinforcement Learning

1 code implementation • 21 Feb 2022 • Jian Zhao, Mingyu Yang, Youpeng Zhao, Xunhan Hu, Wengang Zhou, Jiangcheng Zhu, Houqiang Li

Specifically, we model both individual Q-values and global Q-value with categorical distribution.

Multi-agent Reinforcement Learning Starcraft +1

Paper
Code

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

no code implementations • 9 Feb 2022 • Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye Hao, Jiangcheng Zhu, Houqiang Li

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.

Paper
Add Code

Direct Molecular Conformation Generation

1 code implementation • 3 Feb 2022 • Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu

Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology.

Molecular Docking

Paper
Code

Representing Videos as Discriminative Sub-graphs for Action Recognition

no code implementations • CVPR 2021 • Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.

Action Recognition Graph Learning +1

Paper
Add Code

Contextual Similarity Distillation for Asymmetric Image Retrieval

no code implementations • CVPR 2022 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li, Qi Tian

To this end, we propose a flexible contextual similarity distillation framework to enhance the small query model and keep its output feature compatible with that of large gallery model, which is crucial with asymmetric retrieval.

Image Retrieval Retrieval

Paper
Add Code

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

no code implementations • 20 Dec 2021 • Yufei Kuang, Miao Lu, Jie Wang, Qi Zhou, Bin Li, Houqiang Li

Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators.

Paper
Add Code

Radio-Assisted Human Detection

no code implementations • 16 Dec 2021 • Chengrun Qiu, Dongheng Zhang, Yang Hu, Houqiang Li, Qibin Sun, Yan Chen

In this paper, we propose a radio-assisted human detection framework by incorporating radio information into the state-of-the-art detection methods, including anchor-based onestage detectors and two-stage detectors.

Human Detection Region Proposal

Paper
Add Code

Learning Token-based Representation for Image Retrieval

1 code implementation • 12 Dec 2021 • Hui Wu, Min Wang, Wengang Zhou, Yang Hu, Houqiang Li

Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention.

Ranked #2 on Image Retrieval on RParis (Medium)

Image Retrieval Retrieval

Paper
Code

VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion

no code implementations • 29 Nov 2021 • Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li, Yanyong Zhang

However, this approach often suffers from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance.

3D Object Detection Data Augmentation +2

Paper
Add Code

Dual Progressive Prototype Network for Generalized Zero-Shot Learning

no code implementations • NeurIPS 2021 • Chaoqun Wang, Shaobo Min, Xuejin Chen, Xiaoyan Sun, Houqiang Li

This enables DPPN to produce visual representations with accurate attribute localization ability, which benefits the semantic-visual alignment and representation transferability.

Attribute Generalized Zero-Shot Learning

Paper
Add Code

Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling

1 code implementation • 29 Oct 2021 • Yiheng Liu, Wengang Zhou, Qiaokang Xie, Houqiang Li

To this end, we propose to explore unsupervised person re-identification with both visual data and wireless positioning trajectories under weak scene labeling, in which we only need to know the locations of the cameras.

Scene Labeling Unsupervised Person Re-Identification

Paper
Code

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

331

Paper
Code

Contextual Similarity Aggregation with Self-attention for Visual Re-ranking

1 code implementation • NeurIPS 2021 • Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

Since our re-ranking model is not directly involved with the visual feature used in the initial retrieval, it is ready to be applied to retrieval result lists obtained from various retrieval algorithms.

Content-Based Image Retrieval Data Augmentation +2

Paper
Code

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations • 25 Oct 2021 • Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

331

Paper
Code

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

no code implementations • 11 Oct 2021 • Weiming Liu, Huacong Jiang, Bin Li, Houqiang Li

Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) are regret minimization algorithms for Online Convex Optimization (OCO), they are mathematically elegant but less practical in solving Extensive-Form Games (EFGs).

counterfactual

Paper
Add Code

SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

no code implementations • ICCV 2021 • Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, Houqiang Li

To validate the effectiveness of our method on SLR, we perform extensive experiments on four public benchmark datasets, i. e., NMFs-CSL, SLR500, MSASL and WLASL.

Ranked #1 on Sign Language Recognition on WLASL100 (using extra training data)

Self-Supervised Learning Sign Language Recognition

Paper
Add Code

Multi-Agent Reinforcement Learning with Shared Resource in Inventory Management

no code implementations • 29 Sep 2021 • Mingxiao Feng, Guozi Liu, Li Zhao, Lei Song, Jiang Bian, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

We consider inventory management (IM) problem for a single store with a large number of SKUs (stock keeping units) in this paper, where we need to make replenishment decisions for each SKU to balance its supply and demand.

Management Multi-agent Reinforcement Learning +2

Paper
Add Code

One-shot Key Information Extraction from Document with Deep Partial Graph Matching

no code implementations • 26 Sep 2021 • Minghong Yao, Zhiguang Liu, Liangwei Wang, Houqiang Li, Liansheng Zhuang

However, collecting and labeling a large dataset is time-consuming and is not a user-friendly requirement for many cloud platforms.

Graph Matching Key Information Extraction

Paper
Add Code

Learning Fine-Grained Motion Embedding for Landscape Animation

no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

Paper
Add Code

Discovering Representation Sprachbund For Multilingual Pre-Training

no code implementations • Findings (EMNLP) 2021 • Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li, Ming Zhou, Nan Duan

Then we cluster all the target languages into multiple groups and name each group as a representation sprachbund.

Multilingual NLP

Paper
Add Code

Heredity-aware Child Face Image Generation with Latent Space Disentanglement

no code implementations • 25 Aug 2021 • Xiao Cui, Wengang Zhou, Yang Hu, Weilun Wang, Houqiang Li

The main idea is to disentangle the latent space of a pre-trained generation model and precisely control the face attributes of child images with clear semantics.

Disentanglement Image Generation

Paper
Add Code

Conditional DETR for Fast Training Convergence

3 code implementations • ICCV 2021 • Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang

Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention.

Object object-detection +1

125,118

Paper
Code

Instance-wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation

no code implementations • ICCV 2021 • Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, Houqiang Li

In this paper, we uncover that the negative examples play a critical role in the performance of contrastive learning for image translation.

Contrastive Learning Image-to-Image Translation +1

Paper
Add Code

Joint Inductive and Transductive Learning for Video Object Segmentation

1 code implementation • ICCV 2021 • Yunyao Mao, Ning Wang, Wengang Zhou, Houqiang Li

In this work, we propose to integrate transductive and inductive learning into a unified framework to exploit the complementarity between them for accurate and robust video object segmentation.

Ranked #4 on Semi-Supervised Video Object Segmentation on DAVIS (no YouTube-VOS training)

Object Semantic Segmentation +3

Paper
Code

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

1 code implementation • 30 Jul 2021 • Jiajun Deng, Wengang Zhou, Yanyong Zhang, Houqiang Li

To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection.

3D Object Detection object-detection +1

Paper
Code

Supervised Off-Policy Ranking

1 code implementation • 3 Jul 2021 • Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu

Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance.

Off-policy evaluation

Paper
Code

Revisiting Knowledge Distillation: An Inheritance and Exploration Framework

1 code implementation • CVPR 2021 • Zhen Huang, Xu Shen, Jun Xing, Tongliang Liu, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-Sheng Hua

The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from the teacher model to the student model, while the exploration part is encouraged to learn representations different from the inherited ones with a dis-similarity loss.

Knowledge Distillation

Paper
Code

Weakly Supervised Temporal Adjacent Network for Language Grounding

1 code implementation • 30 Jun 2021 • Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li

To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.

Multiple Instance Learning Sentence

Paper
Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +5

Paper
Add Code

Multi-Modal 3D Object Detection in Autonomous Driving: a Survey

no code implementations • 24 Jun 2021 • Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Houqiang Li, Yanyong Zhang

In this survey, we first introduce the background of popular sensors used for self-driving, their data properties, and the corresponding object detection algorithms.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Model-Aware Gesture-to-Gesture Translation

no code implementations • CVPR 2021 • Hezhen Hu, Weilun Wang, Wengang Zhou, Weichao Zhao, Houqiang Li

Then, a transformation flow is calculated based on the correspondence of the source and target topology map.

Gesture-to-Gesture Translation Sign Language Production +1

Paper
Add Code

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation

no code implementations • CVPR 2021 • Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Semi-supervised learning is a useful tool for image segmentation, mainly due to its ability in extracting knowledge from unlabeled data to assist learning from labeled data.

Continual Learning Image Segmentation +3

Paper
Add Code

Dual-view Molecule Pre-training

1 code implementation • 17 Jun 2021 • Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

After pre-training, we can use either the Transformer branch (this one is recommended according to empirical results), the GNN branch, or both for downstream tasks.

Ranked #1 on Molecular Property Prediction on HIV dataset

Molecular Property Prediction Property Prediction +2

Paper
Code

Uformer: A General U-Shaped Transformer for Image Restoration

4 code implementations • CVPR 2022 • Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li

Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration.

Ranked #2 on Deblurring on RealBlur-R (trained on GoPro)

Deblurring Image Deblurring +5

733

Paper
Code

Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task

no code implementations • 1 Jun 2021 • Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models.

Self-Supervised Learning

Paper
Add Code

Improving Sign Language Translation with Monolingual Data by Sign Back-Translation

no code implementations • CVPR 2021 • Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, Houqiang Li

Finally, the synthetic parallel data serves as a strong supplement for the end-to-end training of the encoder-decoder SLT framework.

Ranked #4 on Sign Language Translation on CSL-Daily

Sign Language Recognition Sign Language Translation +1

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +3

Paper
Add Code

Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training

no code implementations • 19 Apr 2021 • Chenyi Lei, Shixian Luo, Yong liu, Wanggui He, Jiamang Wang, Guoxin Wang, Haihong Tang, Chunyan Miao, Houqiang Li

The pre-trained neural models have recently achieved impressive performances in understanding multimodal content.

Contrastive Learning Language Modelling +2

Paper
Add Code

TransVG: End-to-End Visual Grounding with Transformers

2 code implementations • ICCV 2021 • Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li

In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to the corresponding region onto an image.

Ranked #14 on Referring Expression Comprehension on RefCOCO

Referring Expression Comprehension Visual Grounding

151

Paper
Code

ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation

1 code implementation • ACL 2021 • Weizhen Qi, Yeyun Gong, Yu Yan, Can Xu, Bolun Yao, Bartuer Zhou, Biao Cheng, Daxin Jiang, Jiusheng Chen, Ruofei Zhang, Houqiang Li, Nan Duan

ProphetNet is a pre-training based natural language generation method which shows powerful performance on English text summarization and question generation tasks.

Code Generation Open-Domain Dialog +4

620

Paper
Code

Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning

no code implementations • 5 Apr 2021 • Chaoqun Wang, Xuejin Chen, Shaobo Min, Xiaoyan Sun, Houqiang Li

First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity.

Contrastive Learning Generalized Zero-Shot Learning

Paper
Add Code

Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE

2 code implementations • CVPR 2021 • Jialun Peng, Dong Liu, Songcen Xu, Houqiang Li

We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.

Image Inpainting Quantization +1

172

Paper
Code

IOT: Instance-wise Layer Reordering for Transformer Structures

1 code implementation • ICLR 2021 • Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.

Abstractive Text Summarization Code Generation +2

Paper
Code

Learning Deep Local Features With Multiple Dynamic Attentions for Large-Scale Image Retrieval

1 code implementation • ICCV 2021 • Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

To this end, we propose a novel deep local feature learning architecture to simultaneously focus on multiple discriminative local patterns in an image.

Image Retrieval Metric Learning +1

Paper
Code

Consistent Instance Classification for Unsupervised Representation Learning

no code implementations • 1 Jan 2021 • Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang

The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.

Classification General Classification +1

Paper
Add Code

3D Local Convolutional Neural Networks for Gait Recognition

1 code implementation • ICCV 2021 • Zhen Huang, Dixiu Xue, Xu Shen, Xinmei Tian, Houqiang Li, Jianqiang Huang, Xian-Sheng Hua

Second, different body parts possess different scales, and even the same part in different frames can appear at different locations and scales.

Ranked #2 on Gait Recognition on OUMVLP

Gait Recognition

Paper
Code

BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining

1 code implementation • 31 Dec 2020 • Weizhen Qi, Yeyun Gong, Jian Jiao, Yu Yan, Weizhu Chen, Dayiheng Liu, Kewen Tang, Houqiang Li, Jiusheng Chen, Ruofei Zhang, Ming Zhou, Nan Duan

In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation.

Dialogue Generation Question Generation +1

Paper
Code

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

5 code implementations • 31 Dec 2020 • Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, Houqiang Li

In this paper, we take a slightly different viewpoint -- we find that precise positioning of raw points is not essential for high performance 3D object detection and that the coarse voxel granularity can also offer sufficient detection accuracy.

Ranked #4 on 3D Object Detection on KITTI Cars Moderate val

3D Object Detection object-detection +2

4,325

Paper
Code

Contrastive Transformation for Self-supervised Correspondence Learning

1 code implementation • 9 Dec 2020 • Ning Wang, Wengang Zhou, Houqiang Li

It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e. g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e. g., VOT and VOS).

Self-Supervised Learning Semantic Segmentation +3

Paper
Code

Unsupervised Pre-training for Person Re-identification

1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

217

Paper
Code

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

no code implementations • NeurIPS 2020 • Qi Zhou, Yufei Kuang, Zherui Qiu, Houqiang Li, Jie Wang

However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures.

Continuous Control reinforcement-learning +1

Paper
Add Code

Spatio-Temporal Inception Graph Convolutional Networks for Skeleton-Based Action Recognition

1 code implementation • 26 Nov 2020 • Zhen Huang, Xu Shen, Xinmei Tian, Houqiang Li, Jianqiang Huang, Xian-Sheng Hua

The topology of the adjacency graph is a key factor for modeling the correlations of the input skeletons.

Action Recognition Skeleton Based Action Recognition +1

Paper
Code

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

no code implementations • 19 Nov 2020 • Xinyue Huo, Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Hao Li, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation.

Contrastive Learning Data Augmentation +1

Paper
Add Code

Can Semantic Labels Assist Self-Supervised Visual Representation Learning?

no code implementations • 17 Nov 2020 • Longhui Wei, Lingxi Xie, Jianzhong He, Jianlong Chang, Xiaopeng Zhang, Wengang Zhou, Houqiang Li, Qi Tian

Recently, contrastive learning has largely advanced the progress of unsupervised visual representation learning.

Contrastive Learning Representation Learning +1

Paper
Add Code

ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine

no code implementations • 21 Oct 2020 • Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, Ming Zhou

We build a dataset from a real-word sponsored search engine and carry out experiments to analyze different generative retrieval models.

Retrieval

Paper
Add Code

Masked Contrastive Representation Learning for Reinforcement Learning

1 code implementation • 15 Oct 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Jiajun Deng, Wengang Zhou, Tao Qin, Houqiang Li

During inference, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded.

Atari Games Contrastive Learning +3

Paper
Code

Boosting Continuous Sign Language Recognition via Cross Modality Augmentation

no code implementations • 11 Oct 2020 • Junfu Pu, Wengang Zhou, Hezhen Hu, Houqiang Li

Continuous sign language recognition (SLR) deals with unaligned video-text pair and uses the word error rate (WER), i. e., edit distance, as the main evaluation metric.

Sentence Sign Language Recognition

Paper
Add Code

Improving Person Re-identification with Iterative Impression Aggregation

no code implementations • 21 Sep 2020 • Dengpan Fu, Bo Xin, Jingdong Wang, Dong-Dong Chen, Jianmin Bao, Gang Hua, Houqiang Li

Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods.

Person Re-Identification Re-Ranking

Paper
Add Code

Global-local Enhancement Network for NMFs-aware Sign Language Recognition

no code implementations • 24 Aug 2020 • Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li

Sign language recognition (SLR) is a challenging problem, involving complex manual features, i. e., hand gestures, and fine-grained non-manual features (NMFs), i. e., facial expression, mouth shapes, etc.

Sign Language Recognition

Paper
Add Code

Vision Meets Wireless Positioning: Effective Person Re-identification with Recurrent Context Propagation

1 code implementation • 10 Aug 2020 • Yiheng Liu, Wengang Zhou, Mao Xi, Sanjing Shen, Houqiang Li

Existing person re-identification methods rely on the visual sensor to capture the pedestrians.

Person Re-Identification

Paper
Code

Unsupervised Deep Representation Learning for Real-Time Tracking

1 code implementation • 22 Jul 2020 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Wei Liu, Houqiang Li

The advancement of visual tracking has continuously been brought by deep learning models.

Representation Learning Visual Tracking

158

Paper
Code

Single Shot Video Object Detector

1 code implementation • 7 Jul 2020 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object object-detection +2

Paper
Code

Efficient Integer-Arithmetic-Only Convolutional Neural Networks

1 code implementation • 21 Jun 2020 • Hengrui Zhao, Dong Liu, Houqiang Li

Considering the tradeoff between activation quantization error and network learning ability, we set an empirical rule to tune the bound of each Bounded ReLU.

Image Super-Resolution Quantization

Paper
Code

Cascaded Regression Tracking: Towards Online Hard Distractor Discrimination

no code implementations • 18 Jun 2020 • Ning Wang, Wengang Zhou, Qi Tian, Houqiang Li

In the second stage, a discrete sampling based ridge regression is designed to double-check the remaining ambiguous hard samples, which serves as an alternative of fully-connected layers and benefits from the closed-form solver for efficient learning.

regression Visual Tracking

Paper
Add Code

M-LVC: Multiple Frames Prediction for Learned Video Compression

1 code implementation • CVPR 2020 • Jianping Lin, Dong Liu, Houqiang Li, Feng Wu

To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well.

MS-SSIM SSIM +1

Paper
Code

Long Short-Term Relation Networks for Video Action Detection

no code implementations • 31 Mar 2020 • Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei

It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.

Action Detection Object +2

Paper
Add Code

Incorporating BERT into Neural Machine Translation

3 code implementations • ICLR 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.

Ranked #1 on Unsupervised Machine Translation on WMT2014 English-French

Natural Language Understanding NMT +5

351

Paper
Code

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

no code implementations • 8 Feb 2020 • Hao Zhou, Wengang Zhou, Yun Zhou, Houqiang Li

Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module.

Ranked #4 on Sign Language Recognition on RWTH-PHOENIX-Weather 2014 T

Pose Estimation Sign Language Recognition

Paper
Add Code

Soft Hindsight Experience Replay

2 code implementations • 6 Feb 2020 • Qiwei He, Liansheng Zhuang, Houqiang Li

However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stability and convergence, which significantly affects the final performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

A Generalization Theory based on Independent and Task-Identically Distributed Assumption

no code implementations • 28 Nov 2019 • Guanhua Zheng, Jitao Sang, Houqiang Li, Jian Yu, Changsheng Xu

The derived generalization bound based on the ITID assumption identifies the significance of hypothesis invariance in guaranteeing generalization performance.

Image Classification

Paper
Add Code

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

1 code implementation • 28 Nov 2019 • Qi Zhou, Houqiang Li, Jie Wang

In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)---a novel model-based approach---that can effectively improve the asymptotic performance using the uncertainty in Q-values.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Quantization Networks

1 code implementation • CVPR 2019 • Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-Sheng Hua

The proposed quantization function can be learned in a lossless and end-to-end manner and works for any weights and activations of neural networks in a simple and uniform way.

Image Classification object-detection +2

122

Paper
Code

AETv2: AutoEncoding Transformations for Self-Supervised Representation Learning by Minimizing Geodesic Distances in Lie Groups

no code implementations • 16 Nov 2019 • Feng Lin, Haohang Xu, Houqiang Li, Hongkai Xiong, Guo-Jun Qi

For this reason, we should use the geodesic to characterize how an image transform along the manifold of a transformation group, and adopt its length to measure the deviation between transformations.

Representation Learning Self-Supervised Learning

Paper
Add Code

Progressive Unsupervised Person Re-identification by Tracklet Association with Spatio-Temporal Regularization

1 code implementation • 25 Oct 2019 • Qiaokang Xie, Wengang Zhou, Guo-Jun Qi, Qi Tian, Houqiang Li

In our approach, we first collect tracklet data within each camera by automatic person detection and tracking.

Human Detection Representation Learning +1

Paper
Code

An End-to-End Foreground-Aware Network for Person Re-Identification

no code implementations • 25 Oct 2019 • Yiheng Liu, Wengang Zhou, Jianzhuang Liu, Guo-Jun Qi, Qi Tian, Houqiang Li

By presenting a target attention loss, the pedestrian features extracted from the foreground branch become more insensitive to the backgrounds, which greatly reduces the negative impacts of changing backgrounds on matching an identical across different camera views.

Person Re-Identification

Paper
Add Code

Learn Interpretable Word Embeddings Efficiently with von Mises-Fisher Distribution

no code implementations • 25 Sep 2019 • Minghong Yao, Liansheng Zhuang, Houqiang Li, Jian Yang, Shafei Wang

Results show that our model can outperform the dominant models consistently in these tasks.

Word Embeddings Word Similarity

Paper
Add Code

Relation Distillation Networks for Video Object Detection

2 code implementations • ICCV 2019 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.

Object object-detection +3

562

Paper
Code

Real-Time Correlation Tracking via Joint Model Compression and Transfer

1 code implementation • 23 Jul 2019 • Ning Wang, Wengang Zhou, Yibing Song, Chao Ma, Houqiang Li

In the distillation process, we propose a fidelity loss to enable the student network to maintain the representation capability of the teacher network.

Computational Efficiency Image Classification +4

Paper
Code

Progressive Learning of Low-Precision Networks

no code implementations • 28 May 2019 • Zhengguang Zhou, Wengang Zhou, Xutao Lv, Xuan Huang, Xiaoyu Wang, Houqiang Li

Recent years have witnessed the great advance of deep learning in a variety of vision tasks.

Paper
Add Code

Online Filter Clustering and Pruning for Efficient Convnets

no code implementations • 28 May 2019 • Zhengguang Zhou, Wengang Zhou, Richang Hong, Houqiang Li

Pruning filters is an effective method for accelerating deep neural networks (DNNs), but most existing approaches prune filters on a pre-trained network directly which limits in acceleration.

Clustering

Paper
Add Code

Deep Learning-Based Video Coding: A Review and A Case Study

1 code implementation • 29 Apr 2019 • Dong Liu, Yue Li, Jianping Lin, Houqiang Li, Feng Wu

For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively.

Multimedia Image and Video Processing

Paper
Code

Unsupervised Deep Tracking

1 code implementation • CVPR 2019 • Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li

We propose an unsupervised visual tracking method in this paper.

Visual Tracking

158

Paper
Code

Spatial and Temporal Mutual Promotion for Video-based Person Re-identification

1 code implementation • 26 Dec 2018 • Yiheng Liu, Zhenxun Yuan, Wengang Zhou, Houqiang Li

How to explore the abundant spatial-temporal information in video sequences is the key to solve this problem.

Video-Based Person Re-Identification

Paper
Code

Affinity Derivation and Graph Merge for Instance Segmentation

1 code implementation • ECCV 2018 • Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu

We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance.

Instance Segmentation Semantic Segmentation

Paper
Code

In Defense of the Classification Loss for Person Re-Identification

1 code implementation • 16 Sep 2018 • Yao Zhai, Xun Guo, Yan Lu, Houqiang Li

The recent research for person re-identification has been focused on two trends.

Classification General Classification +2

Paper
Code

Multi-Cue Correlation Filters for Robust Visual Tracking

1 code implementation • CVPR 2018 • Ning Wang, Wengang Zhou, Qi Tian, Richang Hong, Meng Wang, Houqiang Li

By combining different types of features, our approach constructs multiple experts through Discriminative Correlation Filter (DCF) and each of them tracks the target independently.

Visual Tracking

Paper
Code

Visual Attribute-augmented Three-dimensional Convolutional Neural Network for Enhanced Human Action Recognition

no code implementations • 8 May 2018 • Yunfeng Wang, Wengang Zhou, Qilin Zhang, Houqiang Li

Visual attributes in individual video frames, such as the presence of characteristic objects and scenes, offer substantial information for action recognition in videos.

Action Recognition In Videos Attribute +4

Paper
Add Code

Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network

no code implementations • 8 May 2018 • Yunfeng Wang, Wengang Zhou, Qilin Zhang, Xiaotian Zhu, Houqiang Li

Termed "Weighted Multi-Region Convolutional Neural Network" (WMR ConvNet), the proposed system is LSTM-free, and is based on 2D ConvNet that does not require the accumulation of video frames for 3D ConvNet filtering.

Action Recognition Chunking +2

Paper
Add Code

To Create What You Tell: Generating Videos from Captions

no code implementations • 23 Apr 2018 • Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei

In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.

Philosophy

Paper
Add Code

Towards Open-Set Identity Preserving Face Synthesis

no code implementations • CVPR 2018 • Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua

We then recombine the identity vector and the attribute vector to synthesize a new face of the subject with the extracted attribute.

Attribute Face Generation

Paper
Add Code

Video-based Sign Language Recognition without Temporal Segmentation

no code implementations • 30 Jan 2018 • Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li

Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data.

Segmentation Sentence +1

Paper
Add Code

Feature Selective Networks for Object Detection

no code implementations • CVPR 2018 • Yao Zhai, Jingjing Fu, Yan Lu, Houqiang Li

The RoI-based sub-region attention map and aspect ratio attention map are selectively pooled from the banks, and then used to refine the original RoI features for RoI classification.

Object object-detection +2

Paper
Add Code

Neural network-based arithmetic coding of intra prediction modes in HEVC

no code implementations • 18 Sep 2017 • Rui Song, Dong Liu, Houqiang Li, Feng Wu

In this paper, we propose an arithmetic coding strategy by training neural networks, and make preliminary studies on coding of the intra prediction modes in HEVC.

Multimedia

Paper
Add Code

Recent Advance in Content-based Image Retrieval: A Literature Survey

no code implementations • 19 Jun 2017 • Wengang Zhou, Houqiang Li, Qi Tian

The explosive increase and ubiquitous accessibility of visual data on the Web have led to the prosperity of research activity in image search or retrieval.

Content-Based Image Retrieval Retrieval

Paper
Add Code

CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training

3 code implementations • ICCV 2017 • Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua

Our approach models an image as a composition of label and latent attributes in a probabilistic model.

Attribute Data Augmentation +4

Paper
Code

A Convolutional Neural Network Approach for Half-Pel Interpolation in Video Coding

no code implementations • 10 Mar 2017 • Ning Yan, Dong Liu, Houqiang Li, Feng Wu

To further improve the coding efficiency, sub-pel motion compensation has been utilized, which requires interpolation of fractional samples.

Multimedia

Paper
Add Code

Convolutional Neural Network-Based Block Up-sampling for Intra Frame Coding

no code implementations • 22 Feb 2017 • Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, Haitao Yang

A block can be down-sampled before being compressed by normal intra coding, and then up-sampled to its original resolution.

Multimedia

Paper
Add Code

Projection based advanced motion model for cubic mapping for 360-degree video

no code implementations • 21 Feb 2017 • Li Li, Zhu Li, Madhukar Budagavi, Houqiang Li

This paper proposes a novel advanced motion model to handle the irregular motion for the cubic map projection of 360-degree video.

Paper
Add Code

Video Captioning with Transferred Semantic Attributes

no code implementations • CVPR 2017 • Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community.

Sentence Video Captioning

Paper
Add Code

Comparative Deep Learning of Hybrid Representations for Image Recommendations

no code implementations • CVPR 2016 • Chenyi Lei, Dong Liu, Weiping Li, Zheng-Jun Zha, Houqiang Li

In many image-related tasks, learning expressive and discriminative representations of images is essential, and deep learning has been studied for automating the learning of such representations.

Paper
Add Code

Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition

no code implementations • CVPR 2015 • Ting Yao, Yingwei Pan, Chong-Wah Ngo, Houqiang Li, Tao Mei

In many real-world applications, we are often facing the problem of cross domain learning, i. e., to borrow the labeled data or transfer the already learnt knowledge from a source domain to a target domain.

Domain Adaptation Object Recognition +1

Paper
Add Code

SOM: Semantic Obviousness Metric for Image Quality Assessment

no code implementations • CVPR 2015 • Peng Zhang, Wengang Zhou, Lei Wu, Houqiang Li

We propose to extract two types of features, one to measure the semantic obviousness of the image and the other to discover local characteristic.

Image Quality Estimation No-Reference Image Quality Assessment +1

Paper
Add Code

Jointly Modeling Embedding and Translation to Bridge Video and Language

no code implementations • CVPR 2016 • Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui

Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Sentence Translation

Paper
Add Code

Separable Kernel for Image Deblurring

no code implementations • CVPR 2014 • Lu Fang, Haifeng Liu, Feng Wu, Xiaoyan Sun, Houqiang Li

In this paper, we deal with the image deblurring problem in a completely new perspective by proposing separable kernel to represent the inherent properties of the camera and scene system.

Deblurring Image Deblurring

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.