no code implementations • 30 Mar 2024 • YuAn Wang, Rui Sun, Naisong Luo, Yuwen Pan, Tianzhu Zhang
Open-vocabulary semantic segmentation (OVS) aims to segment images of arbitrary categories specified by class labels or captions.
1 code implementation • 28 Mar 2024 • Xiao Lin, Wenfei Yang, Yuan Gao, Tianzhu Zhang
(2) The second design is a Geometric-Aware Feature Aggregation module, which can efficiently integrate the local and global geometric information into keypoint features.
no code implementations • 25 Mar 2024 • Jiacheng Deng, Jiahao Lu, Tianzhu Zhang
Unsupervised point cloud shape correspondence aims to establish point-wise correspondences between source and target point clouds.
1 code implementation • 22 Mar 2024 • Jiahao Lu, Jiacheng Deng, Tianzhu Zhang
To generate higher quality pseudo-labels and achieve more precise weakly supervised 3DIS results, we propose the Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation (BSNet), which devises a novel pseudo-labeler called Simulation-assisted Transformer.
no code implementations • 7 Mar 2024 • Yu Zhu, Chuxiong Sun, Wenfei Yang, Wenqiang Wei, Bo Tang, Tianzhu Zhang, Zhiyu Li, Shifeng Zhang, Feiyu Xiong, Jie Hu, MingChuan Yang
Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values.
no code implementations • 1 Mar 2024 • Xin Liu, Jiamin Wu, Tianzhu Zhang
To address this issue, we propose a Multi-modal Attribute Prompting method (MAP) by jointly exploring textual attribute prompting, visual attribute prompting, and attribute-level alignment.
no code implementations • 5 Feb 2024 • Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu
To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network.
1 code implementation • 20 Jan 2024 • Yinchao Ma, Yuyang Tang, Wenfei Yang, Tianzhu Zhang, Jinpeng Zhang, Mengxue Kang
Single object tracking aims to locate the target object in a video sequence according to the state specified by different modal references, including the initial bounding box (BBOX), natural language (NL), or both (NL+BBOX).
no code implementations • 3 Jan 2024 • Yulin Li, Tianzhu Zhang, Yongdong Zhang
Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images.
no code implementations • ICCV 2023 • Chuxin Wang, Wenfei Yang, Tianzhu Zhang
Semi-supervised 3D object detection from point cloud aims to train a detector with a small number of labeled data and a large number of unlabeled data.
2 code implementations • 15 Dec 2023 • Ruijie Zhu, Jiahao Chang, Ziyang Song, Jiahuan Yu, Tianzhu Zhang
This report describes the solution that secured the first place in the "View Synthesis Challenge for Human Heads (VSCHH)" at the ICCV 2023 workshop.
1 code implementation • NeurIPS 2023 • YuAn Wang, Naisong Luo, Tianzhu Zhang
In this paper, we rethink the importance of support information and propose a new query-centric FSS model Adversarial Mining Transformer (AMFormer), which achieves accurate query image segmentation with only rough support guidance or even weak support labels.
1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
no code implementations • 12 Oct 2023 • Ziyang Song, Ruijie Zhu, Chuxin Wang, Jiacheng Deng, Jianfeng He, Tianzhu Zhang
Self-supervised monocular depth estimation holds significant importance in the fields of autonomous driving and robotics.
Ranked #1 on Unsupervised Monocular Depth Estimation on KITTI-C (using extra training data)
no code implementations • 22 Aug 2023 • Jilong Wang, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, Tianzhu Zhang, Liang Wang
Gait recognition is to seek correct matches for query individuals by their unique walking patterns.
1 code implementation • 27 Jul 2023 • Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu
In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation.
1 code implementation • CVPR 2023 • Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
Ranked #2 on Weakly Supervised Action Localization on THUMOS’14
Multiple Instance Learning Weakly Supervised Action Localization +2
1 code implementation • CVPR 2023 • Jiacheng Deng, Chuxin Wang, Jiahao Lu, Jianfeng He, Tianzhu Zhang, Jiyang Yu, Zhe Zhang
The key of our approach is to exploit an orientation estimation module with a domain adaptive discriminator to align the orientations of point cloud pairs, which significantly alleviates the mispredictions of symmetrical parts.
Ranked #2 on 3D Dense Shape Correspondence on SHREC'19 (using extra training data)
no code implementations • CVPR 2023 • Jiahuan Yu, Jiahao Chang, Jianfeng He, Tianzhu Zhang, Feng Wu
To deal with the above issues, we propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching, which jointly models the local consistency and scale variations in a unified coarse-to-fine architecture.
no code implementations • 29 Mar 2023 • Jiahao Chang, Jiahuan Yu, Tianzhu Zhang
Local feature matching is challenging due to textureless and repetitive patterns.
no code implementations • CVPR 2023 • YuAn Wang, Rui Sun, Tianzhu Zhang
In this work, we rethink how to mitigate the false matches from the perspective of representative reference features (referred to as buoys), and propose a novel adaptive buoys correlation (ABC) network to rectify direct pairwise pixel-level correlation, including a buoys mining module and an adaptive correlation module.
no code implementations • ICCV 2023 • Dawei Yang, Jianfeng He, Yinchao Ma, Qianjin Yu, Tianzhu Zhang
To address the above limitations, we propose a novel foreground-background distribution modeling transformer for visual object tracking (F-BDMTrack), including a fore-background agent learning (FBAL) module and a distribution-aware attention (DA2) module in a unified transformer architecture.
no code implementations • ICCV 2023 • Rui Sun, YuAn Wang, Huayu Mai, Tianzhu Zhang, Feng Wu
In this work, we reconcile the inherent tension of spatial and temporal information to retrieve memory frame information along the object trajectory, and propose a novel and coherent Trajectory Memory Retrieval Network (TMRN) to equip with the trajectory information, including a spatial alignment module and a temporal aggregation module.
no code implementations • CVPR 2023 • Tianyu Chang, Xun Yang, Tianzhu Zhang, Meng Wang
In this way, we can prevent the model from exploiting the artifacts of synthetic stereo images as shortcut features, thereby estimating the disparity maps more effectively based on the learned robust and shortcut-invariant representation.
no code implementations • CVPR 2023 • Naisong Luo, Yuwen Pan, Rui Sun, Tianzhu Zhang, Zhiwei Xiong, Feng Wu
To address these challenges, we propose a novel De-camouflaging Network (DCNet) including a pixel-level camouflage decoupling module and an instance-level camouflage suppression module.
no code implementations • CVPR 2023 • Jianfeng He, Yuan Gao, Tianzhu Zhang, Zhe Zhang, Feng Wu
Second, the HKDL module can generate keypoint detectors in a hierarchical way, which is helpful for detecting keypoints with diverse levels of structures.
no code implementations • ICCV 2023 • Xi Wei, Zhangxiang Shi, Tianzhu Zhang, Xiaoyuan Yu, Lei Xiao
Scene boundary detection breaks down long videos into meaningful story-telling units and plays a crucial role in high-level video understanding.
no code implementations • CVPR 2023 • Huayu Mai, Rui Sun, Tianzhu Zhang, Zhiwei Xiong, Feng Wu
Automatic mitochondria segmentation enjoys great popularity with the development of deep learning.
no code implementations • ICCV 2023 • Jiahao Lu, Jiacheng Deng, Chuxin Wang, Jianfeng He, Tianzhu Zhang
Additionally, we design an affiliated transformer decoder that suppresses the interference of noise background queries and helps the foreground queries focus on instance discriminative parts to predict final segmentation results.
Ranked #3 on 3D Instance Segmentation on ScanNet(v2)
no code implementations • ICCV 2023 • Yuwen Pan, Naisong Luo, Rui Sun, Meng Meng, Tianzhu Zhang, Zhiwei Xiong, Yongdong Zhang
Mitochondria, as tiny structures within the cell, are of significant importance to study cell functions for biological and clinical analysis.
no code implementations • CVPR 2023 • Weiwei Feng, Nanqing Xu, Tianzhu Zhang, Yongdong Zhang
Concretely, the former adopts a dynamic convolution kernel and a static convolution kernel for the specific instance and the global dataset, respectively, which can inherit the advantages of both instance-specific and instance-agnostic attacks.
no code implementations • 29 Dec 2022 • Tianzhu Zhang, Han Qiu, Gabriele Castellano, Myriana Rifai, Chung Shue Chen, Fabio Pianese
This paper aims to provide a comprehensive survey on log parsing.
no code implementations • ECCV 2022 • Kongzhu Jiang, Tianzhu Zhang, Xiang Liu, Bingqiao Qian, Yongdong Zhang, Feng Wu ;
To alleviate the above issues, we propose a novel Cross-Modality Transformer (CMT) to jointly explore a modality-level alignment module and an instance-level module for VI-ReID.
1 code implementation • CVPR 2022 • Jinsheng Wang, Yinchao Ma, Shaofei Huang, Tianrui Hui, Fei Wang, Chen Qian, Tianzhu Zhang
Earlier works follow a top-down roadmap to regress predefined anchors into various shapes of lane lines, which lacks enough flexibility to fit complex shapes of lanes due to the fixed anchor shapes.
Ranked #4 on Lane Detection on TuSimple (F1 score metric)
no code implementations • CVPR 2022 • Jiamin Wu, Tianzhu Zhang, Zhe Zhang, Feng Wu, Yongdong Zhang
To address this issue, we propose an end-to-end Motion-modulated Temporal Fragment Alignment Network (MTFAN) by jointly exploring the task-specific motion modulation and the multi-level temporal fragment alignment for Few-Shot Action Recognition (FSAR).
no code implementations • 23 Nov 2021 • Pengfei Zhu, Hongtao Yu, Kaihua Zhang, Yu Wang, Shuai Zhao, Lei Wang, Tianzhu Zhang, QinGhua Hu
To address this issue, segmentation-based trackers have been proposed that employ per-pixel matching to improve the tracking performance of deformable objects effectively.
1 code implementation • ICCV 2021 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang
In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
3D Object Detection From Monocular Images Depth Estimation +3
no code implementations • CVPR 2021 • Wenfei Yang, Tianzhu Zhang, Xiaoyuan Yu, Tian Qi, Yongdong Zhang, Feng Wu
To alleviate this problem, we propose a novel Uncertainty Guided Collaborative Training (UGCT) strategy, which mainly includes two key designs: (1) The first design is an online pseudo label generation module, in which the RGB and FLOW streams work collaboratively to learn from each other.
no code implementations • CVPR 2021 • Rui Sun, Yihao Li, Tianzhu Zhang, Zhendong Mao, Feng Wu, Yongdong Zhang
First, to the best of our knowledge, this is the first work to formulate lesion discovery as a weakly supervised lesion localization problem via a transformer decoder.
no code implementations • CVPR 2021 • Yulin Li, Jianfeng He, Tianzhu Zhang, Xiang Liu, Yongdong Zhang, Feng Wu
To address these issues, we propose a novel end-to-end Part-Aware Transformer (PAT) for occluded person Re-ID through diverse part discovery via a transformer encoderdecoder architecture, including a pixel context based transformer encoder and a part prototype based transformer decoder.
no code implementations • CVPR 2021 • Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang
In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank.
Ranked #7 on Weakly Supervised Action Localization on THUMOS14
Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1
no code implementations • ICCV 2021 • Weiwei Feng, Baoyuan Wu, Tianzhu Zhang, Yong Zhang, Yongdong Zhang
To tackle these issues, we propose a class-agnostic and model-agnostic physical adversarial attack model (Meta-Attack), which is able to not only generate robust physical adversarial examples by simulating color and shape distortions, but also generalize to attacking novel images and novel DNN models by accessing a few digital and physical images.
no code implementations • ICCV 2021 • Meng Meng, Tianzhu Zhang, Qi Tian, Yongdong Zhang, Feng Wu
To the best of our knowledge, this is the first work that can achieve remarkable performance for both tasks by optimizing them jointly via FAM for WSOL.
no code implementations • ICCV 2021 • Jiamin Wu, Tianzhu Zhang, Yongdong Zhang, Feng Wu
The task-aware part filters can adapt to any individual task and automatically mine task-related local parts even for an unseen task.
1 code implementation • 3 Aug 2020 • Yehui Yang, Fangxin Shang, Binghong Wu, Dalu Yang, Lei Wang, Yanwu Xu, Wensheng Zhang, Tianzhu Zhang
As a result, it exploits more discriminative features for DR grading.
1 code implementation • CVPR 2020 • Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang
The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase.
Ranked #17 on Cross-Modal Retrieval on Flickr30k
no code implementations • CVPR 2020 • Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu
In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance.
Cross-Modality Person Re-identification Person Re-Identification
1 code implementation • ICCV 2019 • Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, Zeng-Guang Hou
First, it can exploit pixel alignment and feature alignment jointly.
Cross-Modality Person Re-identification Generative Adversarial Network +2
1 code implementation • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Junyu Gao, Tianzhu Zhang, Changsheng Xu
To effectively leverage the knowledge graph, we design a novel Two-Stream Graph Convolutional Network (TS-GCN) consisting of a classifier branch and an instance branch.
Ranked #5 on Zero-Shot Action Recognition on Olympics
no code implementations • 25 May 2019 • Ting-Ting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras
Temporal action localization has recently attracted significant interest in the Computer Vision community.
no code implementations • 25 Nov 2018 • Xiao Wang, Chenglong Li, Rui Yang, Tianzhu Zhang, Jin Tang, Bin Luo
To refine the states of the target and re-track the target when it is back to view from heavy occlusion and out of view, we elaborately design a novel subnetwork to learn the target-driven visual attentions from the guidance of both visual and natural language cues.
no code implementations • CVPR 2018 • Feifei Zhang, Tianzhu Zhang, Qirong Mao, Changsheng Xu
First, the encoder-decoder structure of the generator can learn a generative and discriminative identity representation for face images.
no code implementations • CVPR 2017 • Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang
In this paper, we propose a multi-task correlation particle filter (MCPF) for robust visual tracking.
no code implementations • CVPR 2016 • Adel Bibi, Tianzhu Zhang, Bernard Ghanem
In this paper, we present a part-based sparse tracker in a particle filter framework where both the motion and appearance model are formulated in 3D.
no code implementations • CVPR 2016 • Tianzhu Zhang, Adel Bibi, Bernard Ghanem
Sparse representation has been introduced to visual tracking by finding the best target candidate with minimal reconstruction error within the particle filter framework.
no code implementations • CVPR 2016 • Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu
In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking.
no code implementations • CVPR 2015 • Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Narendra Ahuja, Ming-Hsuan Yang
Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates.
no code implementations • CVPR 2014 • Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
The proposed part matching tracker (PMT) has a number of attractive properties.
no code implementations • 31 Mar 2014 • Kui Jia, Tsung-Han Chan, Zinan Zeng, Shenghua Gao, Gang Wang, Tianzhu Zhang, Yi Ma
The task is to identify the inlier features and establish their consistent correspondences across the image set.