Search Results for author: Guanbin Li

Found 156 papers, 73 papers with code

Propagating Over Phrase Relations for One-Stage Visual Grounding

no code implementations • ECCV 2020 • Sibei Yang, Guanbin Li, Yizhou Yu

Phrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence.

Phrase Grounding Relational Reasoning +2

Paper
Add Code

Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction

no code implementations • 24 Apr 2024 • Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin

Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner.

Scheduling

Paper
Add Code

UniFL: Improve Stable Diffusion via Unified Feedback Learning

no code implementations • 8 Apr 2024 • Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Weilin Huang, Min Zheng, Lean Fu, Guanbin Li

Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications.

Image Generation

Paper
Add Code

OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

no code implementations • 26 Mar 2024 • Ganlong Zhao, Guanbin Li, Weikai Chen, Yizhou Yu

Recent advances in Iterative Vision-and-Language Navigation (IVLN) introduce a more meaningful and practical paradigm of VLN by maintaining the agent's memory across tours of scenes.

Vision and Language Navigation

Paper
Add Code

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

no code implementations • 26 Mar 2024 • Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients.

Monocular 3D Object Detection object-detection +1

Paper
Add Code

NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation

no code implementations • 26 Mar 2024 • Jiahao Chen, Yipeng Qin, Lingjie Liu, Jiangbo Lu, Guanbin Li

Neural Radiance Field (NeRF) has been widely recognized for its excellence in novel view synthesis and 3D scene reconstruction.

3D Scene Reconstruction Novel View Synthesis

Paper
Add Code

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

1 code implementation • ICCV 2023 • Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, YingYing Li, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

To tackle the confirmation bias from incorrect pseudo labels of minority classes, the class-rebalancing sampling module resamples unlabeled data following the guidance of the gradient-based reweighting module.

object-detection Object Detection +1

Paper
Code

Annotation-Efficient Polyp Segmentation via Active Learning

no code implementations • 21 Mar 2024 • Duojun Huang, Xinyu Xiong, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li

To minimize annotation costs, we propose a deep active learning framework for annotation-efficient polyp segmentation.

Active Learning Segmentation

Paper
Add Code

Semi- and Weakly-Supervised Learning for Mammogram Mass Segmentation with Limited Annotations

no code implementations • 14 Mar 2024 • Xinyu Xiong, Churan Wang, Wenxue Li, Guanbin Li

Accurate identification of breast masses is crucial in diagnosing breast cancer; however, it can be challenging due to their small size and being camouflaged in surrounding normal glands.

Segmentation Weakly-supervised Learning

Paper
Add Code

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

no code implementations • 9 Mar 2024 • Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning.

Lesion Segmentation Segmentation +1

Paper
Add Code

Large Multimodal Agents: A Survey

no code implementations • 23 Feb 2024 • Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li

In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short).

Decision Making

Paper
Add Code

UniCell: Universal Cell Nucleus Classification via Prompt Learning

1 code implementation • 20 Feb 2024 • Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li

The recognition of multi-class cell nuclei can significantly facilitate the process of histopathological diagnosis.

Classification

Paper
Code

Cell Graph Transformer for Nuclei Classification

1 code implementation • 20 Feb 2024 • Wei Lou, Guanbin Li, Xiang Wan, Haofeng Li

Nuclei classification is a critical step in computer-aided diagnosis with histopathology images.

Classification Nuclei Classification

Paper
Code

MEIA: Towards Realistic Multimodal Interaction and Manipulation for Embodied Robots

1 code implementation • 1 Feb 2024 • Yang Liu, Xinshuai Song, Kaixuan Jiang, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin

To overcome this limitation, we introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.

Embodied Question Answering Language Modelling +3

110

Paper
Code

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

no code implementations • 26 Jan 2024 • Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region.

3D scene Editing

Paper
Add Code

Inter-Domain Mixup for Semi-Supervised Domain Adaptation

no code implementations • 21 Jan 2024 • Jichang Li, Guanbin Li, Yizhou Yu

However, existing SSDA work fails to make full use of label information from both source and target domains for feature alignment across domains, resulting in label mismatch in the label space during model testing.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

Paper
Add Code

Adaptive Betweenness Clustering for Semi-Supervised Domain Adaptation

no code implementations • 21 Jan 2024 • Jichang Li, Guanbin Li, Yizhou Yu

Once the graph has been refined, Adaptive Betweenness Clustering is introduced to facilitate semantic transfer by using across-domain betweenness clustering and within-domain betweenness clustering, thereby propagating semantic label information from labeled samples across domains to unlabeled target data.

Clustering Semi-supervised Domain Adaptation +1

Paper
Add Code

ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

1 code implementation • 10 Jan 2024 • Yuncheng Jiang, Zixun Zhang, Yiwen Hu, Guanbin Li, Xiang Wan, Song Wu, Shuguang Cui, Silin Huang, Zhen Li

Accurate polyp detection is critical for early colorectal cancer diagnosis.

Contrastive Learning

Paper
Code

Credible Teacher for Semi-Supervised Object Detection in Open Scene

no code implementations • 1 Jan 2024 • Jingyu Zhuang, Kuo Wang, Liang Lin, Guanbin Li

Credible Teacher adopts an interactive teaching mechanism using flexible labels to prevent uncertain pseudo labels from misleading the model and gradually reduces its uncertainty through the guidance of other credible pseudo labels.

object-detection Object Detection +1

Paper
Add Code

Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation

no code implementations • 22 Dec 2023 • Chaowei Fang, Ziyin Zhou, Junye Chen, Hanjing Su, Qingyao Wu, Guanbin Li

We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.

Image Segmentation Segmentation +1

Paper
Add Code

Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal

no code implementations • 22 Dec 2023 • Yicheng Leng, Chaowei Fang, Gen Li, Yixiang Fang, Guanbin Li

Visible watermarks, while instrumental in protecting image copyrights, frequently distort the underlying content, complicating tasks like scene interpretation and image editing.

Paper
Add Code

FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

1 code implementation • 19 Dec 2023 • Jichang Li, Guanbin Li, Hui Cheng, Zicheng Liao, Yizhou Yu

However, these prior methods do not learn noise filters by exploiting knowledge across all clients, leading to sub-optimal and inferior noise filtering performance and thus damaging training stability.

Federated Learning Learning with noisy labels +1

Paper
Code

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

no code implementations • 12 Dec 2023 • Haiming Zhang, Zhihao Yuan, Chaoda Zheng, Xu Yan, Baoyuan Wang, Guanbin Li, Song Wu, Shuguang Cui, Zhen Li

Our proposed GSmoothFace model mainly consists of the Audio to Expression Prediction (A2EP) module and the Target Adaptive Face Translation (TAFT) module.

Face Model Talking Face Generation

Paper
Add Code

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

1 code implementation • 7 Dec 2023 • Yong liu, Sule Bai, Guanbin Li, Yitong Wang, Yansong Tang

We attribute this to the in-vocabulary embedding and domain-biased CLIP prediction.

Ranked #2 on Open Vocabulary Semantic Segmentation on PascalVOC-20

Attribute Open Vocabulary Semantic Segmentation

Paper
Code

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

no code implementations • 4 Dec 2023 • Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt.

3D scene Editing

Paper
Add Code

Diffusion-based Data Augmentation for Nuclei Image Segmentation

1 code implementation • 22 Oct 2023 • Xinyi Yu, Guanbin Li, Wei Lou, SiQi Liu, Xiang Wan, Yan Chen, Haofeng Li

Therefore, augmenting a dataset with only a few labeled images to improve the segmentation performance is of significant research and application value.

Data Augmentation Image Generation +3

Paper
Code

Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection

1 code implementation • ICCV 2023 • Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li

Multi-class cell nuclei detection is a fundamental prerequisite in the diagnosis of histopathology.

Paper
Code

Prompt-based Grouping Transformer for Nucleus Detection and Classification

1 code implementation • 22 Oct 2023 • Junjia Huang, Haofeng Li, Weijun Sun, Xiang Wan, Guanbin Li

Automatic nuclei detection and classification can produce effective information for disease diagnosis.

Classification Semantic Similarity +1

Paper
Code

Multi-stream Cell Segmentation with Low-level Cues for Multi-modality Images

1 code implementation • 22 Oct 2023 • Wei Lou, Xinyi Yu, Chenyu Liu, Xiang Wan, Guanbin Li, SiQi Liu, Haofeng Li

Afterward, we train a separate segmentation model for each category using the images in the corresponding category.

Cell Segmentation Segmentation

Paper
Code

Semantic-aware Temporal Channel-wise Attention for Cardiac Function Assessment

no code implementations • 9 Oct 2023 • Guanqi Chen, Guanbin Li

Cardiac function assessment aims at predicting left ventricular ejection fraction (LVEF) given an echocardiogram video, which requests models to focus on the changes in the left ventricle during the cardiac cycle.

Auxiliary Learning regression +1

Paper
Add Code

ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models

1 code implementation • 3 Sep 2023 • Yuhao Du, Yuncheng Jiang, Shuangyi Tan, Xusheng Wu, Qi Dou, Zhen Li, Guanbin Li, Xiang Wan

Colonoscopy analysis, particularly automatic polyp segmentation and detection, is essential for assisting clinical diagnosis and treatment.

Segmentation

Paper
Code

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

1 code implementation • 22 Aug 2023 • Tao Chen, Ze Lin, Hui Li, Jiayi Ji, Yiyi Zhou, Guanbin Li, Rongrong Ji

Furthermore, we model product attributes based on both text and image modalities so that multi-modal product characteristics can be manifested in the generated summaries.

Attribute

Paper
Code

WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning

no code implementations • 20 Aug 2023 • Dongjian Huo, Zehong Zhang, Hanjing Su, Guanbin Li, Chaowei Fang, Qingyao Wu

Existing watermark removal methods mainly rely on UNet with task-specific decoder branches--one for watermark localization and the other for background image restoration.

Decoder Image Restoration +1

Paper
Add Code

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

1 code implementation • CVPR 2023 • Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li

Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.

Image-text matching Text Matching +1

Paper
Code

Divide and Adapt: Active Domain Adaptation via Customized Learning

1 code implementation • CVPR 2023 • Duojun Huang, Jichang Li, Weikai Chen, Junshi Huang, Zhenhua Chai, Guanbin Li

To accommodate active learning and domain adaption, the two naturally different tasks, in a collaborative framework, we advocate that a customized learning strategy for the target data is the key to the success of ADA solutions.

Active Learning Informativeness +3

Paper
Code

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

1 code implementation • ICCV 2023 • Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities.

Ranked #2 on Referring Expression Segmentation on RefCOCO

Decoder Image Segmentation +3

Paper
Code

Improved Distribution Matching for Dataset Condensation

2 code implementations • CVPR 2023 • Ganlong Zhao, Guanbin Li, Yipeng Qin, Yizhou Yu

In this paper, we propose a novel dataset condensation method based on distribution matching, which is more efficient and promising.

Dataset Condensation Model Optimization

1,176

Paper
Code

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

1 code implementation • ICCV 2023 • Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin

Moreover, these methods ignore how to utilize the fine-grained dependencies among different skeleton joints to pre-train an efficient skeleton sequence learning model that can generalize well across different datasets.

Action Recognition Decoder +2

Paper
Code

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

3 code implementations • CVPR 2023 • Jiacheng Zhang, Xiangru Lin, Wei zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage.

Ranked #1 on Semi-Supervised Object Detection on COCO 5% labeled data

Object object-detection +3

12,095

Paper
Code

Universal Semi-supervised Model Adaptation via Collaborative Consistency Training

no code implementations • 7 Jul 2023 • Zizheng Yan, Yushuang Wu, Yipeng Qin, Xiaoguang Han, Shuguang Cui, Guanbin Li

In this paper, we introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA), which i) requires only a pre-trained source model, ii) allows the source and target domain to have different label sets, i. e., they share a common label set and hold their own private label set, and iii) requires only a few labeled samples in each class of the target domain.

Domain Adaptation

Paper
Add Code

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

no code implementations • 30 Jun 2023 • Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai, Xiaolin Wei, Liang Lin, Yizhou Yu

In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.

Paper
Add Code

CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning

2 code implementations • 30 Jun 2023 • Yang Liu, Weixing Chen, Guanbin Li, Liang Lin

We present CausalVLR (Causal Visual-Linguistic Reasoning), an open-source toolbox containing a rich set of state-of-the-art causal relation discovery and causal inference methods for various visual-linguistic reasoning tasks, such as VQA, image/video captioning, medical report generation, model generalization and robustness, etc.

Causal Inference Medical Report Generation +2

110

Paper
Code

DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

1 code implementation • 23 Jun 2023 • Jingyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin Li

Neural fields have achieved impressive advancements in view synthesis and scene reconstruction.

3D scene Editing

110

Paper
Code

DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback

1 code implementation • 13 Jun 2023 • Junfan Lin, Yuying Zhu, Lingbo Liu, Yang Liu, Guanbin Li, Liang Lin

1) The travel time of a vehicle is delayed feedback on the effectiveness of TSC policy at each traffic intersection since it is obtained after the vehicle has left the road network.

Reinforcement Learning (RL)

Paper
Code

Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

no code implementations • CVPR 2023 • Ricong Huang, Peiwen Lai, Yipeng Qin, Guanbin Li

In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads.

Data Augmentation Image Inpainting

Paper
Add Code

YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection

no code implementations • 6 Jun 2023 • Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li

YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.

Contrastive Learning

Paper
Add Code

Long-term Wind Power Forecasting with Hierarchical Spatial-Temporal Transformer

no code implementations • 30 May 2023 • Yang Zhang, Lingbo Liu, Xinyu Xiong, Guanbin Li, Guoli Wang, Liang Lin

In this work, we propose a novel end-to-end wind power forecasting model named Hierarchical Spatial-Temporal Transformer Network (HSTTN) to address the long-term WPF problems.

Decoder

Paper
Add Code

Identity-Preserving Talking Face Generation with Landmark and Appearance Priors

1 code implementation • CVPR 2023 • Weizhi Zhong, Chaowei Fang, Yinqi Cai, Pengxu Wei, Gangming Zhao, Liang Lin, Guanbin Li

Prior landmark characteristics of the speaker's face are employed to make the generated landmarks coincide with the facial outline of the speaker.

Talking Face Generation

575

Paper
Code

Visual Causal Scene Refinement for Video Question Answering

2 code implementations • 7 May 2023 • Yushen Wei, Yang Liu, Hong Yan, Guanbin Li, Liang Lin

Our VCSR involves two essential modules: i) the Question-Guided Refiner (QGR) module, which refines consecutive video frames guided by the question semantics to obtain more representative segment features for causal front-door intervention; ii) the Causal Scene Separator (CSS) module, which discovers a collection of visual causal and non-causal scenes based on the visual-linguistic causal relevance and estimates the causal effect of the scene-separating intervention in a contrastive learning manner.

Contrastive Learning Question Answering +2

110

Paper
Code

SCoDA: Domain Adaptive Shape Completion for Real Scans

1 code implementation • CVPR 2023 • Yushuang Wu, Zizheng Yan, Ce Chen, Lai Wei, Xiao Li, Guanbin Li, Yihao Li, Shuguang Cui, Xiaoguang Han

Thus, we propose a new task, SCoDA, for the domain adaptation of real scan shape completion from synthetic data.

Benchmarking Domain Adaptation +1

183

Paper
Code

Urban Regional Function Guided Traffic Flow Prediction

no code implementations • 17 Mar 2023 • Kuo Wang, Lingbo Liu, Yang Liu, Guanbin Li, Fan Zhou, Liang Lin

The prediction of traffic flow is a challenging yet crucial problem in spatial-temporal analysis, which has recently gained increasing interest.

Paper
Add Code

Cross-Modal Causal Intervention for Medical Report Generation

2 code implementations • 16 Mar 2023 • Weixing Chen, Yang Liu, Ce Wang, Jiarui Zhu, Shen Zhao, Guanbin Li, Cheng-Lin Liu, Liang Lin

Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance, which can relieve the heavy burden of radiologists by automatically generating the corresponding medical reports according to the given radiology image.

Medical Report Generation object-detection +1

110

Paper
Code

Structure Embedded Nucleus Classification for Histopathology Images

no code implementations • 22 Feb 2023 • Wei Lou, Xiang Wan, Guanbin Li, Xiaoying Lou, Chenghang Li, Feng Gao, Haofeng Li

Next, we convert a histopathology image into a graph structure with nuclei as nodes, and build a graph neural network to embed the spatial distribution of nuclei into their representations.

Classification Graph structure learning +1

Paper
Add Code

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

1 code implementation • ICCV 2023 • Zhihong Chen, Shizhe Diao, Benyou Wang, Guanbin Li, Xiang Wan

Medical vision-and-language pre-training (Med-VLP) has shown promising improvements on many downstream medical tasks owing to its applicability to extracting generic representations from medical images and texts.

Image Retrieval Image-text Classification +7

Paper
Code

Adaptive Context Selection for Polyp Segmentation

1 code implementation • 12 Jan 2023 • Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, Yizhou Yu

To tackle these issues, we propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM).

Decoder Segmentation

Paper
Code

Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation

1 code implementation • 12 Jan 2023 • Ruifei Zhang, Sishuo Liu, Yizhou Yu, Guanbin Li

Since the two tasks rely on similar feature information, the unlabeled data effectively enhances the representation of the network to the lesion regions and further improves the segmentation performance.

Image Segmentation Medical Image Segmentation +3

Paper
Code

Lesion-aware Dynamic Kernel for Polyp Segmentation

1 code implementation • 12 Jan 2023 • Ruifei Zhang, Peiwen Lai, Xiang Wan, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li

Automatic and accurate polyp segmentation plays an essential role in early colorectal cancer diagnosis.

Decoder Segmentation

Paper
Code

Enhanced Soft Label for Semi-Supervised Semantic Segmentation

no code implementations • ICCV 2023 • Jie Ma, Chuan Wang, Yang Liu, Liang Lin, Guanbin Li

As a mainstream framework in the field of semi-supervised learning (SSL), self-training via pseudo labeling and its variants have witnessed impressive progress in semi-supervised semantic segmentation with the recent advance of deep neural networks.

Contrastive Learning Pseudo Label +1

Paper
Add Code

RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels

no code implementations • ICCV 2023 • Ziyi Zhang, Weikai Chen, Chaowei Fang, Zhen Li, Lechao Chen, Liang Lin, Guanbin Li

Confidence-wise, we propose a novel sample selection strategy based on confidence representation voting instead of the widely-used small-loss criterion.

Learning with noisy labels Representation Learning +1

Paper
Add Code

Which Pixel to Annotate: a Label-Efficient Nuclei Segmentation Framework

1 code implementation • 20 Dec 2022 • Wei Lou, Haofeng Li, Guanbin Li, Xiaoguang Han, Xiang Wan

Recently deep neural networks, which require a large amount of annotated samples, have been widely applied in nuclei instance segmentation of H\&E stained pathology images.

Instance Segmentation Segmentation +1

Paper
Code

BoxPolyp:Boost Generalized Polyp Segmentation Using Extra Coarse Bounding Box Annotations

1 code implementation • 7 Dec 2022 • Jun Wei, Yiwen Hu, Guanbin Li, Shuguang Cui, S Kevin Zhou, Zhen Li

In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model.

Segmentation

Paper
Code

Compound Batch Normalization for Long-tailed Image Classification

no code implementations • 2 Dec 2022 • Lechao Cheng, Chaowei Fang, Dingwen Zhang, Guanbin Li, Gang Huang

It can model the feature space more comprehensively and reduce the dominance of head classes.

Classification Image Classification

Paper
Add Code

Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

2 code implementations • 12 Nov 2022 • Ziyi Zhang, Weikai Chen, Hui Cheng, Zhen Li, Siyuan Li, Liang Lin, Guanbin Li

We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data.

Ranked #4 on Source-Free Domain Adaptation on VisDA-2017

Contrastive Learning Source-Free Domain Adaptation

Paper
Code

Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training

1 code implementation • CVPR 2023 • Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang Wen Chen

During inference, instead of changing the motion generator, our method reformulates the input text into a masked motion as the prompt for the motion generator to ``reconstruct'' the motion.

Language Modelling Zero-Shot Learning

Paper
Code

View-Disentangled Transformer for Brain Lesion Detection

1 code implementation • 20 Sep 2022 • Haofeng Li, Junjia Huang, Guanbin Li, Zhou Liu, Yihong Zhong, Yingying Chen, Yunfei Wang, Xiang Wan

Deep neural networks (DNNs) have been widely adopted in brain lesion detection and segmentation.

Lesion Detection

Paper
Code

Attentive Symmetric Autoencoder for Brain MRI Segmentation

1 code implementation • 19 Sep 2022 • Junjia Huang, Haofeng Li, Guanbin Li, Xiang Wan

Self-supervised learning methods based on image patch reconstruction have witnessed great success in training auto-encoders, whose pre-trained weights can be transferred to fine-tune other downstream tasks of image understanding.

Image Segmentation MRI segmentation +3

Paper
Code

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

1 code implementation • 15 Sep 2022 • Zhihong Chen, Yuhao Du, Jinpeng Hu, Yang Liu, Guanbin Li, Xiang Wan, Tsung-Hui Chang

Besides, we conduct further analysis to better verify the effectiveness of different components of our approach and various settings of pre-training.

Self-Supervised Learning

102

Paper
Code

Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

1 code implementation • 15 Sep 2022 • Zhihong Chen, Guanbin Li, Xiang Wan

Most existing methods mainly contain three elements: uni-modal encoders (i. e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP.

Paper
Code

Neighborhood Collective Estimation for Noisy Label Identification and Correction

1 code implementation • 5 Aug 2022 • Jichang Li, Guanbin Li, Feng Liu, Yizhou Yu

Specifically, our method is divided into two steps: 1) Neighborhood Collective Noise Verification to separate all training samples into a clean or noisy subset, 2) Neighborhood Collective Label Correction to relabel noisy samples, and then auxiliary techniques are used to assist further model optimization.

Learning with noisy labels Model Optimization

Paper
Code

Robust Real-World Image Super-Resolution against Adversarial Attacks

1 code implementation • 31 Jul 2022 • Jiutao Yue, Haofeng Li, Pengxu Wei, Guanbin Li, Liang Lin

Since the frequency masking may not only destroys the adversarial perturbations but also affects the sharp details in a clean image, we further develop an adversarial sample classifier based on the frequency domain of images to determine if applying the proposed mask module.

Image Super-Resolution

Paper
Code

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

1 code implementation • 29 Jul 2022 • Ganlong Zhao, Guanbin Li, Yipeng Qin, Feng Liu, Yizhou Yu

In this paper, we propose a two-stage clean samples identification method to address the aforementioned challenge.

Ranked #3 on Image Classification on Clothing1M (using extra training data)

Image Classification

Paper
Code

Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering

2 code implementations • 26 Jul 2022 • Yang Liu, Guanbin Li, Liang Lin

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video.

Causal Inference Question Answering +2

Paper
Code

Less is More: Adaptive Curriculum Learning for Thyroid Nodule Diagnosis

1 code implementation • 2 Jul 2022 • Haifan Gong, Hui Cheng, Yifan Xie, Shuangyi Tan, Guanqi Chen, Fei Chen, Guanbin Li

Thyroid nodule classification aims at determining whether the nodule is benign or malignant based on a given ultrasound image.

Classification

Paper
Code

BronchusNet: Region and Structure Prior Embedded Representation Learning for Bronchus Segmentation and Classification

no code implementations • 14 May 2022 • Wenhao Huang, Haifan Gong, huan zhang, Yu Wang, Haofeng Li, Guanbin Li, Hong Shen

CT-based bronchial tree analysis plays an important role in the computer-aided diagnosis for respiratory diseases, as it could provide structured information for clinicians.

Classification Graph Learning +3

Paper
Add Code

Multi-level Consistency Learning for Semi-supervised Domain Adaptation

1 code implementation • 9 May 2022 • Zizheng Yan, Yushuang Wu, Guanbin Li, Yipeng Qin, Xiaoguang Han, Shuguang Cui

Semi-supervised domain adaptation (SSDA) aims to apply knowledge learned from a fully labeled source domain to a scarcely labeled target domain.

Ranked #1 on Semi-supervised Domain Adaptation on VisDA2017

Domain Adaptation Semi-supervised Domain Adaptation

Paper
Code

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

1 code implementation • CVPR 2022 • Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

To address this issue, we propose an unsupervised domain adaptation mechanism for real-world SR, named Dual ADversarial Adaptation (DADA), which only requires LR images in the target domain with available real paired data from a source camera.

Image Super-Resolution Unsupervised Domain Adaptation

Paper
Code

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

no code implementations • 26 Apr 2022 • Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing.

Benchmarking Out-of-Distribution Generalization +2

Paper
Add Code

Open Set Domain Adaptation By Novel Class Discovery

no code implementations • 7 Mar 2022 • Jingyu Zhuang, Ziliang Chen, Pengxu Wei, Guanbin Li, Liang Lin

In Open Set Domain Adaptation (OSDA), large amounts of target samples are drawn from the implicit categories that never appear in the source domain.

Domain Adaptation Novel Class Discovery

Paper
Add Code

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

1 code implementation • CVPR 2022 • Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Zhen Li, Shuguang Cui

Thus, a more faithful caption can be generated only using point clouds during the inference.

3D dense captioning Dense Captioning +2

Paper
Code

Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

1 code implementation • 26 Feb 2022 • Pengxiang Yan, Ziyi Wu, Mengmeng Liu, Kun Zeng, Liang Lin, Guanbin Li

To relieve the burden of labor-intensive labeling, deep unsupervised SOD methods have been proposed to exploit noisy labels generated by handcrafted saliency methods.

object-detection Object Detection +2

Paper
Code

PointMatch: A Consistency Training Framework for Weakly Supervised Semantic Segmentation of 3D Point Clouds

no code implementations • 22 Feb 2022 • Yushuang Wu, Zizheng Yan, Shengcai Cai, Guanbin Li, Yizhou Yu, Xiaoguang Han, Shuguang Cui

Semantic segmentation of point cloud usually relies on dense annotation that is exhausting and costly, so it attracts wide attention to investigate solutions for the weakly supervised scheme with only sparse points annotated.

Representation Learning Weakly supervised Semantic Segmentation +1

Paper
Add Code

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Segmentation

1 code implementation • 8 Feb 2022 • Xinkai Zhao, Chaowei Fang, De-Jun Fan, Xutao Lin, Feng Gao, Guanbin Li

Semi-supervised learning (SSL), which aims at leveraging a few labeled images and a large number of unlabeled images for network training, is beneficial for relieving the burden of data annotation in medical image segmentation.

Contrastive Learning Image Segmentation +5

Paper
Code

Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction

no code implementations • 30 Nov 2021 • Lingbo Liu, Zewei Yang, Guanbin Li, Kuo Wang, Tianshui Chen, Liang Lin

Land remote sensing analysis is a crucial research in earth science.

Representation Learning

Paper
Add Code

Explore before Moving: A Feasible Path Estimation and Memory Recalling Framework for Embodied Navigation

no code implementations • 16 Oct 2021 • Yang Wu, Shirui Feng, Guanbin Li, Liang Lin

PEMR includes a "looking ahead" process, \textit{i. e.} a visual feature extractor module that estimates feasible paths for gathering 3D navigational information, which is mimicking the human sense of direction.

Common Sense Reasoning Embodied Question Answering +1

Paper
Add Code

Road Network Guided Fine-Grained Urban Traffic Flow Inference

1 code implementation • 29 Sep 2021 • Lingbo Liu, Mengmeng Liu, Guanbin Li, Ziyi Wu, Junfan Lin, Liang Lin

Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture.

Paper
Code

Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning

no code implementations • ICCV 2021 • Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei, Pengxu Wei, Liang Lin, Guanbin Li

Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.

Binary Classification

Paper
Add Code

Towards Interpretable Deep Networks for Monocular Depth Estimation

1 code implementation • ICCV 2021 • Zunzhi You, Yi-Hsuan Tsai, Wei-Chen Chiu, Guanbin Li

Based on our observations, we quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.

Monocular Depth Estimation

Paper
Code

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations • 9 Aug 2021 • Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Paper
Add Code

Colorectal Polyp Classification from White-light Colonoscopy Images via Domain Alignment

no code implementations • 5 Aug 2021 • Qin Wang, Hui Che, Weizhen Ding, Li Xiang, Guanbin Li, Zhen Li, Shuguang Cui

Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination.

Contrastive Learning

Paper
Add Code

Online Metro Origin-Destination Prediction via Heterogeneous Information Aggregation

1 code implementation • 2 Jul 2021 • Lingbo Liu, Yuying Zhu, Guanbin Li, Ziyi Wu, Lei Bai, Liang Lin

In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e. g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership.

Time Series Analysis

Paper
Code

Bottom-Up Shift and Reasoning for Referring Image Segmentation

1 code implementation • CVPR 2021 • Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, Yizhou Yu

In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules.

Image Segmentation Segmentation +2

Paper
Code

Cross-Modal Progressive Comprehension for Referring Segmentation

1 code implementation • 15 May 2021 • Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li

In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.

Ranked #7 on Referring Expression Segmentation on J-HMDB

Attribute Image Segmentation +5

Paper
Code

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations • CVPR 2021 • Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

Ranked #8 on Referring Expression Segmentation on J-HMDB

Decoder feature selection +1

Paper
Add Code

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

2 code implementations • CVPR 2021 • Jichang Li, Guanbin Li, Yemin Shi, Yizhou Yu

Pseudo labeling expands the number of ``labeled" samples in each class in the target domain, and thus produces a more robust and powerful cluster core for each class to facilitate adversarial learning.

Clustering Domain Adaptation +1

1,085

Paper
Code

Deep Transformers for Fast Small Intestine Grounding in Capsule Endoscope Video

no code implementations • 7 Apr 2021 • Xinkai Zhao, Chaowei Fang, Feng Gao, De-Jun Fan, Xutao Lin, Guanbin Li

In this paper, we propose a deep model to ground shooting range of small intestine from a capsule endoscope video which has duration of tens of hours.

Paper
Add Code

Scene-Intuitive Agent for Remote Embodied Visual Grounding

no code implementations • CVPR 2021 • Xiangru Lin, Guanbin Li, Yizhou Yu

Intuitively, we comprehend the semantics of the instruction to form an overview of where a bathroom is and what a blue towel is in mind; then, we navigate to the target location by consistently matching the bathroom appearance in mind with the current scene.

Navigate Referring Expression +1

Paper
Add Code

LapsCore: Language-Guided Person Search via Color Reasoning

no code implementations • ICCV 2021 • Yushuang Wu, Zizheng Yan, Xiaoguang Han, Guanbin Li, Changqing Zou, Shuguang Cui

The key point of language-guided person search is to construct the cross-modal association between visual and textual input.

Colorization Image Colorization +2

Paper
Add Code

Adversarial Training using Contrastive Divergence

no code implementations • 1 Jan 2021 • Hongjun Wang, Guanbin Li, Liang Lin

To protect the security of machine learning models against adversarial examples, adversarial training becomes the most popular and powerful strategy against various adversarial attacks by injecting adversarial examples into training data.

Paper
Add Code

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

1 code implementation • CVPR 2021 • Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin

Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.

Crowd Counting Representation Learning

Paper
Code

Human-centric Spatio-Temporal Video Grounding With Visual Transformers

1 code implementation • 10 Nov 2020 • Zongheng Tang, Yue Liao, Si Liu, Guanbin Li, Xiaojie Jin, Hongxu Jiang, Qian Yu, Dong Xu

HC-STVG is a video grounding task that requires both spatial (where) and temporal (when) localization.

Referring Expression Sentence +3

Paper
Code

A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning

no code implementations • 15 Oct 2020 • Hongjun Wang, Guanbin Li, Xiaobai Liu, Liang Lin

Although deep convolutional neural networks (CNNs) have demonstrated remarkable performance on multiple computer vision tasks, researches on adversarial learning have shown that deep models are vulnerable to adversarial examples, which are crafted by adding visually imperceptible perturbations to the input images.

Adversarial Attack

Paper
Add Code

Contralaterally Enhanced Networks for Thoracic Disease Detection

no code implementations • 9 Oct 2020 • Gangming Zhao, Chaowei Fang, Guanbin Li, Licheng Jiao, Yizhou Yu

Aimed at improving the performance of existing detection methods, we propose a deep end-to-end module to exploit the contralateral context information for enhancing feature representations of disease proposals.

Paper
Add Code

Linguistic Structure Guided Context Modeling for Referring Image Segmentation

1 code implementation • ECCV 2020 • Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, Jizhong Han

Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence.

Dependency Parsing Image Segmentation +3

Paper
Code

Referring Image Segmentation via Cross-Modal Progressive Comprehension

1 code implementation • CVPR 2020 • Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.

Ranked #14 on Referring Expression Segmentation on RefCOCO testB

Attribute Image Segmentation +2

Paper
Code

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

no code implementations • 18 Sep 2020 • Jie Wu, Guanbin Li, Xiaoguang Han, Liang Lin

Temporal grounding of natural language in untrimmed videos is a fundamental yet challenging multimedia task facilitating cross-media visual content retrieval.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Online Alternate Generator against Adversarial Attacks

no code implementations • 17 Sep 2020 • Haofeng Li, Yirui Zeng, Guanbin Li, Liang Lin, Yizhou Yu

The field of computer vision has witnessed phenomenal progress in recent years partially due to the development of deep convolutional neural networks.

Paper
Add Code

Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection

1 code implementation • ECCV 2020 • Ganlong Zhao, Guanbin Li, Ruijia Xu, Liang Lin

Domain adaptation for object detection tries to adapt the detector from labeled datasets to unlabeled ones for better performance.

Domain Adaptation General Classification +4

Paper
Code

Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition

1 code implementation • 1 Sep 2020 • Yang Liu, Keze Wang, Guanbin Li, Liang Lin

In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors.

Action Recognition Image Generation +3

Paper
Code

Graph-Structured Referring Expression Reasoning in The Wild

1 code implementation • CVPR 2020 • Sibei Yang, Guanbin Li, Yizhou Yu

The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression.

Referring Expression

116

Paper
Code

Peeking into occluded joints: A novel framework for crowd pose estimation

1 code implementation • ECCV 2020 • Lingteng Qiu, Xuanye Zhang, Yan-ran Li, Guanbin Li, Xiao-Jun Wu, Zixiang Xiong, Xiaoguang Han, Shuguang Cui

Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions.

Pose Estimation

129

Paper
Code

Efficient Crowd Counting via Structured Knowledge Transfer

2 code implementations • 23 Mar 2020 • Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, Liang Lin

Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.

Crowd Counting Transfer Learning

Paper
Code

Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread

no code implementations • 22 Jan 2020 • Haofeng Li, Guanbin Li, BinBin Yang, Guanqi Chen, Liang Lin, Yizhou Yu

The proposed algorithm for the first time achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.

Image Classification Object +4

Paper
Add Code

Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video

1 code implementation • 18 Jan 2020 • Jie Wu, Guanbin Li, Si Liu, Liang Lin

Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.

Decision Making reinforcement-learning +2

Paper
Code

Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

2 code implementations • 14 Jan 2020 • Lingbo Liu, Jingwen Chen, Hefeng Wu, Jiajie Zhen, Guanbin Li, Liang Lin

To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs.

Representation Learning

Paper
Code

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

no code implementations • 18 Dec 2019 • Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin

In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations.

Position Segmentation +2

Paper
Add Code

Self-Enhanced Convolutional Network for Facial Video Hallucination

no code implementations • 23 Nov 2019 • Chaowei Fang, Guanbin Li, Xiaoguang Han, Yizhou Yu

It further recurrently exploits the reconstructed results and intermediate features of a sequence of preceding frames to improve the initial super-resolution of the current frame by modelling the coherence of structural facial features across frames.

Hallucination Video Super-Resolution

Paper
Add Code

Globally Guided Progressive Fusion Network for 3D Pancreas Segmentation

no code implementations • 23 Nov 2019 • Chaowei Fang, Guanbin Li, Chengwei Pan, Yiming Li, Yizhou Yu

Recently 3D volumetric organ segmentation attracts much research interest in medical image analysis due to its significance in computer aided diagnosis.

Organ Segmentation Pancreas Segmentation +1

Paper
Add Code

Knowledge Graph Transfer Network for Few-Shot Recognition

1 code implementation • 21 Nov 2019 • Riquan Chen, Tianshui Chen, Xiaolu Hui, Hefeng Wu, Guanbin Li, Liang Lin

In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN).

Ranked #1 on Few-Shot Image Classification on ImageNet-FS (10-shot, all)

Few-Shot Image Classification Few-Shot Learning +2

Paper
Code

Learning to Recognize the Unseen Visual Predicates

no code implementations • 25 Sep 2019 • Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo

Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.

Question Answering Visual Question Answering +1

Paper
Add Code

Dynamic Graph Attention for Referring Expression Comprehension

no code implementations • ICCV 2019 • Sibei Yang, Guanbin Li, Yizhou Yu

In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression.

Graph Attention Referring Expression +2

Paper
Add Code

Motion Guided Attention for Video Salient Object Detection

2 code implementations • ICCV 2019 • Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu

In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images.

Object object-detection +4

Paper
Code

A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

no code implementations • CVPR 2020 • Yue Liao, Si Liu, Guanbin Li, Fei Wang, Yanjie Chen, Chen Qian, Bo Li

RCCF reformulates the referring expression comprehension as a correlation filtering process.

Referring Expression Referring Expression Comprehension

Paper
Add Code

Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction

2 code implementations • 2 Sep 2019 • Lingbo Liu, Jiajie Zhen, Guanbin Li, Geng Zhan, Zhaocheng He, Bowen Du, Liang Lin

Specifically, the first ConvLSTM unit takes normal traffic flow features as input and generates a hidden state at each time-step, which is further fed into the connected convolutional layer for spatial attention map inference.

Representation Learning Traffic Prediction

Paper
Code

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations • ICCV 2019 • Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Ranked #4 on Image Retrieval on DeepFashion - Consumer-to-shop (Rank-1 metric)

Image Retrieval Retrieval

Paper
Add Code

Crowd Counting with Deep Structured Scale Integration Network

no code implementations • ICCV 2019 • Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people.

Crowd Counting Representation Learning

Paper
Add Code

Semi-Supervised Video Salient Object Detection Using Pseudo-Labels

1 code implementation • ICCV 2019 • Pengxiang Yan, Guanbin Li, Yuan Xie, Zhen Li, Chuan Wang, Tianshui Chen, Liang Lin

Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module.

Ranked #1 on Video Salient Object Detection on VOS-T (using extra training data)

object-detection Salient Object Detection +2

Paper
Code

Semi-supervised Skin Detection by Network with Mutual Guidance

no code implementations • ICCV 2019 • Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang

In this paper we present a new data-driven method for robust skin detection from a single human portrait image.

Decoder

Paper
Add Code

Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching

1 code implementation • 8 Jul 2019 • Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin

A broad range of cross-$m$-domain generation researches boil down to matching a joint distribution by deep generative models (DGMs).

Paper
Code

Relationship-Embedded Representation Learning for Grounding Referring Expressions

1 code implementation • CVPR 2019 • Sibei Yang, Guanbin Li, Yizhou Yu

Unfortunately, existing work on grounding referring expressions fails to accurately extract multi-order relationships from the referring expression and associate them with the objects and their related contexts in the image.

Referring Expression Representation Learning

116

Paper
Code

Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction

no code implementations • 15 May 2019 • Lingbo Liu, Zhilin Qiu, Guanbin Li, Qing Wang, Wanli Ouyang, Liang Lin

Finally, a GCC module is applied to model the correlation between all regions by computing a global correlation feature as a weighted sum of all regional features, with the weights being calculated as the similarity between the corresponding region pairs.

Paper
Add Code

ROSA: Robust Salient Object Detection against Adversarial Attacks

no code implementations • 9 May 2019 • Haofeng Li, Guanbin Li, Yizhou Yu

To our knowledge, this paper is the first one that mounts successful adversarial attacks on salient object detection models and verifies that adversarial samples are effective on a wide range of existing methods.

Object object-detection +2

Paper
Add Code

Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning

no code implementations • 4 May 2019 • Yukai Shi, Guanbin Li, Qingxing Cao, Keze Wang, Liang Lin

Face hallucination is a domain-specific super-resolution problem that aims to generate a high-resolution (HR) face image from a low-resolution~(LR) input.

Face Hallucination Hallucination +3

Paper
Add Code

Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

no code implementations • 22 Apr 2019 • Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, Liang Lin

Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation.

Facial Action Unit Detection Representation Learning

Paper
Add Code

Harvesting Visual Objects from Internet Images via Deep Learning Based Objectness Assessment

no code implementations • 1 Apr 2019 • Kan Wu, Guanbin Li, Haofeng Li, Jianjun Zhang, Yizhou Yu

As a concrete example, a database of over 1. 2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.

Image Generation Object +1

Paper
Add Code

Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions

no code implementations • 11 Dec 2018 • Weikai Chen, Xiaoguang Han, Guanbin Li, Chao Chen, Jun Xing, Yajie Zhao, Hao Li

Three-dimensional object recognition has recently achieved great progress thanks to the development of effective point cloud-based learning frameworks, such as PointNet and its extensions.

3D Object Recognition

Paper
Add Code

Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning

no code implementations • 10 Dec 2018 • Lingbo Liu, Guanbin Li, Yuan Xie, Yizhou Yu, Qing Wang, Liang Lin

In this paper, we propose a novel cascaded backbone-branches fully convolutional neural network~(BB-FCN) for rapidly and accurately localizing facial landmarks in unconstrained and cluttered settings.

Face Alignment Face Detection +2

Paper
Add Code

FRAME Revisited: An Interpretation View Based on Particle Evolution

no code implementations • 4 Dec 2018 • Xu Cai, Yang Wu, Guanbin Li, Ziliang Chen, Liang Lin

FRAME (Filters, Random fields, And Maximum Entropy) is an energy-based descriptive model that synthesizes visual realism by capturing mutual patterns from structural input signals.

Descriptive

Paper
Add Code

Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation

3 code implementations • ICCV 2019 • Ruijia Xu, Guanbin Li, Jihan Yang, Liang Lin

Domain adaptation enables the learner to safely generalize into novel environments by mitigating domain shifts across distributions.

Ranked #7 on Domain Adaptation on ImageCLEF-DA

Partial Domain Adaptation Transfer Learning +1

3,157

Paper
Code

Cross-Modal Attentional Context Learning for RGB-D Object Detection

no code implementations • 30 Oct 2018 • Guanbin Li, Yukang Gan, Hejun Wu, Nong Xiao, Liang Lin

In this paper, we address this problem by developing a Cross-Modal Attentional Context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data.

Autonomous Driving Object +2

Paper
Add Code

Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

no code implementations • 10 Oct 2018 • Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin

Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision.

Representation Learning Segmentation +1

Paper
Add Code

Attentive Crowd Flow Machines

no code implementations • 1 Sep 2018 • Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin

Traffic flow prediction is crucial for urban traffic management and public safety.

Management

Paper
Add Code

Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

no code implementations • 4 Aug 2018 • Guanbin Li, Xiang He, Wei zhang, Huiyou Chang, Le Dong, Liang Lin

Single image rain streaks removal has recently witnessed substantial progress due to the development of deep convolutional neural networks.

Decoder

Paper
Add Code

Crowd Counting using Deep Recurrent Spatial-Aware Network

no code implementations • 2 Jul 2018 • Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera's perspective that causes huge appearance variations in people's scales and rotations.

Crowd Counting Management

Paper
Add Code

Interpretable Video Captioning via Trajectory Structured Localization

no code implementations • CVPR 2018 • Xian Wu, Guanbin Li, Qingxing Cao, Qingge Ji, Liang Lin

Automatically describing open-domain videos with natural language are attracting increasing interest in the field of artificial intelligence.

Decoder Image Captioning +3

Paper
Add Code

Flow Guided Recurrent Neural Encoder for Video Salient Object Detection

no code implementations • CVPR 2018 • Guanbin Li, Yuan Xie, Tianhao Wei, Keze Wang, Liang Lin

Image saliency detection has recently witnessed significant progress due to deep convolutional neural networks.

Ranked #2 on Video Salient Object Detection on DAVSOD-Difficult20 (using extra training data)

Object object-detection +4

Paper
Add Code

Visual Question Reasoning on General Dependency Tree

no code implementations • CVPR 2018 • Qingxing Cao, Xiaodan Liang, Bailing Li, Guanbin Li, Liang Lin

This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence.

Question Answering Visual Question Answering

Paper
Add Code

Contrast-Oriented Deep Neural Networks for Salient Object Detection

no code implementations • 30 Mar 2018 • Guanbin Li, Yizhou Yu

In this paper, we develop hybrid contrast-oriented deep neural networks to overcome the aforementioned limitations.

Object object-detection +2

Paper
Add Code

Weakly Supervised Salient Object Detection Using Image Labels

no code implementations • 17 Mar 2018 • Guanbin Li, Yuan Xie, Liang Lin

Our algorithm is based on alternately exploiting a graphical model and training a fully convolutional network for model updating.

Object object-detection +3

Paper
Add Code

Context-Aware Semantic Inpainting

no code implementations • 21 Dec 2017 • Haofeng Li, Guanbin Li, Liang Lin, Yizhou Yu

Our proposed GAN-based framework consists of a fully convolutional design for the generator which helps to better preserve spatial structures and a joint loss function with a revised perceptual loss to capture high-level semantics in the context.

Generative Adversarial Network Image Inpainting

Paper
Add Code

Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition

no code implementations • 20 Dec 2017 • Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin

Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

no code implementations • ICCV 2017 • Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin

This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding.

General Classification Multi-Label Image Classification +1

Paper
Add Code

Attention-Aware Face Hallucination via Deep Reinforcement Learning

no code implementations • CVPR 2017 • Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, Guanbin Li

Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images.

Face Hallucination Hallucination +3

Paper
Add Code

Instance-Level Salient Object Segmentation

no code implementations • CVPR 2017 • Guanbin Li, Yuan Xie, Liang Lin, Yizhou Yu

Image saliency detection has recently witnessed rapid progress due to deep convolutional neural networks.

Ranked #15 on RGB Salient Object Detection on DUTS-TE (max F-measure metric)

Instance Segmentation Object +3

Paper
Add Code

Visual Saliency Detection Based on Multiscale Deep CNN Features

2 code implementations • 7 Sep 2016 • Guanbin Li, Yizhou Yu

The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature.

Saliency Detection

Paper
Code

Deep Contrast Learning for Salient Object Detection

no code implementations • CVPR 2016 • Guanbin Li, Yizhou Yu

Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream.

Ranked #19 on RGB Salient Object Detection on DUTS-TE (max F-measure metric)

Object object-detection +2

Paper
Add Code

Visual Saliency Based on Multiscale Deep Features

no code implementations • CVPR 2015 • Guanbin Li, Yizhou Yu

Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision.

Image Segmentation Semantic Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.