1 code implementation • 24 Apr 2024 • Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han
Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal.
no code implementations • 29 Mar 2024 • Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun
To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data.
no code implementations • 19 Mar 2024 • Jingren Liu, Zhong Ji, Yanwei Pang, Yunlong Yu
While anti-amnesia FSCIL learners often excel in incremental sessions, they tend to prioritize mitigating knowledge attrition over harnessing the model's potential for knowledge acquisition.
1 code implementation • 26 Jun 2023 • Zhong Ji, Zhihao LI, Yan Zhang, Haoran Wang, Yanwei Pang, Xuelong Li
Afterwards, the VR module is developed to excavate the potential semantic correlations among multiple region-query pairs, which further explores the high-level reasoning similarity.
1 code implementation • 17 Jan 2023 • Yan Zhang, Zhong Ji, Di Wang, Yanwei Pang, Xuelong Li
(2) It limits the scale of negative sample pairs by employing the mini-batch based end-to-end training mechanism.
no code implementations • 6 Jan 2023 • Di Wang, Ping Wang, Zhong Ji, Xiaojun Yang, Hongyue Li
Conformal prediction is a learning framework controlling prediction coverage of prediction sets, which can be built on any learning algorithm for point prediction.
1 code implementation • 26 Nov 2022 • Zhong Ji, Junhua Hu, Deyin Liu, Lin Yuanbo Wu, Ye Zhao
To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment.
1 code implementation • Conference 2022 • Jin Li, Zhong Ji, Gang Wang, Qiang Wang, Feng Gao
The goal of General Continual Learning (GCL) is to preserve learned knowledge and learn new knowledge with constant memory from an infinite data stream where task boundaries are blurry.
no code implementations • 21 Aug 2022 • Haoran Wang, Dongliang He, Wenhao Wu, Boyang xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang
We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.
no code implementations • 11 Aug 2022 • Zhong Ji, Zhishen Hou, Xiyao Liu, Yanwei Pang, Xuelong Li
Few-shot Class-Incremental Learning (FSCIL) aims at learning new concepts continually with only a few samples, which is prone to suffer the catastrophic forgetting and overfitting problems.
no code implementations • 8 Aug 2022 • Haoran Wang, Di Xu, Dongliang He, Fu Li, Zhong Ji, Jungong Han, Errui Ding
Video-text retrieval (VTR) is an attractive yet challenging task for multi-modal understanding, which aims to search for relevant video (text) given a query (video).
no code implementations • 21 May 2022 • Zhong Ji, Zhenfei Hu, Yaodong Wang, Shengjia Li
Pedestrian Attribute Recognition (PAR) is a challenging task in intelligent video surveillance.
1 code implementation • 3 Sep 2021 • Zhong Ji, Jin Li, Qiang Wang, Zhongfei Zhang
Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes.
no code implementations • 3 Sep 2021 • Xiyao Liu, Zhong Ji, Yanwei Pang, Zhongfei Zhang
However, the target domain is absolutely unknown during the training on the source domain, which results in lacking directed guidance for target tasks.
cross-domain few-shot learning Weakly-Supervised Object Localization
no code implementations • 3 Sep 2021 • Zhong Ji, Zhishen Hou, Xiyao Liu, Yanwei Pang, Jungong Han
Semantic information provides intra-class consistency and inter-class discriminability beyond visual concepts, which has been employed in Few-Shot Learning (FSL) to achieve further gains.
no code implementations • 11 Jun 2021 • Zhong Ji, Kexin Chen, Haoran Wang
Image-text matching plays a central role in bridging the semantic gap between vision and language.
1 code implementation • ECCV 2020 • Haoran Wang, Ying Zhang, Zhong Ji, Yanwei Pang, Lin Ma
In this paper, we propose a Consensus-aware Visual-Semantic Embedding (CVSE) model to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching.
1 code implementation • 19 Jan 2020 • Shizhen Zhao, Changxin Gao, Yuanjie Shao, Lerenhan Li, Changqian Yu, Zhong Ji, Nong Sang
FFU and BFU add the IoU variance to the results of CFU, yielding class-specific foreground and background features, respectively.
1 code implementation • CVPR 2020 • Yunlong Yu, Zhong Ji, Zhongfei Zhang, Jungong Han
We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the learning system requires to recognize unseen classes given only the corresponding class semantics.
no code implementations • 26 Aug 2019 • Zhong Ji, Xuejie Yu, Yunlong Yu, Yanwei Pang, Zhongfei Zhang
Towards alleviating the class imbalance issue in ZSC, we propose a sample-balanced training process to encourage all training classes to contribute equally to the learned model.
no code implementations • ICCV 2019 • Zhong Ji, Haoran Wang, Jungong Han, Yanwei Pang
Concretely, the saliency detector provides the visual saliency information as the guidance for the two attention modules.
1 code implementation • NeurIPS 2018 • Yunlong Yu, Zhong Ji, Yanwei Fu, Jichang Guo, Yanwei Pang, Zhongfei (Mark) Zhang
Zero-Shot Learning (ZSL) is generally achieved via aligning the semantic relationships between the visual features and the corresponding class semantic descriptions.
no code implementations • 20 Nov 2018 • Yunlong Yu, Zhong Ji, Yanwei Pang, Jichang Guo, Zhongfei Zhang, Fei Wu
Existing generative Zero-Shot Learning (ZSL) methods only consider the unidirectional alignment from the class semantics to the visual features while ignoring the alignment from the visual features to the class semantics, which fails to construct the visual-semantic interactions well.
no code implementations • 21 May 2018 • Yunlong Yu, Zhong Ji, Yanwei Fu, Jichang Guo, Yanwei Pang, Zhongfei Zhang
To this end, we propose a novel stacked semantics-guided attention (S2GA) model to obtain semantic relevant features by using individual class semantic features to progressively guide the visual features to generate an attention map for weighting the importance of different local regions.
no code implementations • 6 Feb 2018 • Zhong Ji, Yuxin Sun, Yunlong Yu, Yanwei Pang, Jungong Han
To address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task, we propose a novel Attribute-Guided Network (AgNet), which can perform not only IBIR, but also Text-Based Image Retrieval (TBIR).
no code implementations • 26 Dec 2017 • Yunlong Yu, Zhong Ji, Jichang Guo, Zhongfei, Zhang
Instead of requiring a projection function to transfer information across different modalities like most previous work, LSE per- forms the interactions of different modalities via a feature aware latent space, which is learned in an implicit way.
no code implementations • 31 Aug 2017 • Zhong Ji, Kailin Xiong, Yanwei Pang, Xuelong. Li
This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence.
Ranked #4 on Video Summarization on TvSum (using extra training data)
no code implementations • 13 Jul 2017 • Zhong Ji, Yaru Ma, Yanwei Pang, Xuelong. Li
Given the explosive growth of online videos, it is becoming increasingly important to relieve the tedious work of browsing and managing the video content of interest.
no code implementations • 22 May 2017 • Zhong Ji, Yunxin Sun, Yulong Yu, Jichang Guo, Yanwei Pang
However, the visual features and the class semantic descriptors locate in different structural spaces, a linear or bilinear model can not capture the semantic interactions between different modalities well.
no code implementations • 27 Mar 2017 • Yunlong Yu, Zhong Ji, Xi Li, Jichang Guo, Zhongfei Zhang, Haibin Ling, Fei Wu
As an important and challenging problem in computer vision, zero-shot learning (ZSL) aims at automatically recognizing the instances from unseen object classes without training data.
no code implementations • 27 Mar 2017 • Yunlong Yu, Zhong Ji, Jichang Guo, Yanwei Pang
Two fundamental challenges in it are visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps, respectively.
no code implementations • 30 Jun 2016 • Zhong Ji, Yuzhong Xie, Yanwei Pang, Lei Chen, Zhongfei Zhang
Zero-shot learning (ZSL) extends the conventional image classification technique to a more challenging situation where the test image categories are not seen in the training samples.