2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj
Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.
3 code implementations • 29 Sep 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj
We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.
no code implementations • ICCV 2023 • Yushuang Wu, Xiao Li, Jinglu Wang, Xiaoguang Han, Shuguang Cui, Yan Lu
Specifically, we use a small network similar to NeRF while preserving the rendering speed with a single network forwarding per pixel as in NeLF.
no code implementations • 26 Jul 2023 • Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj
Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.
1 code implementation • 30 May 2023 • Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Bhiksha Raj
The paper introduces PaintSeg, a new unsupervised method for segmenting objects without any training.
no code implementations • CVPR 2023 • Yue Gao, Yuan Zhou, Jinglu Wang, Xiao Li, Xiang Ming, Yan Lu
Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion.
1 code implementation • CVPR 2023 • Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu
The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.
no code implementations • CVPR 2023 • Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu
The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.
no code implementations • ICCV 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu
Our model achieves state-of-the-art performance on R-VOS benchmarks, Ref-DAVIS17 and Ref-Youtube-VOS, and also our RRYTVOS dataset.
no code implementations • 18 Aug 2022 • Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
no code implementations • 12 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, Yan Lu
We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.
1 code implementation • 4 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu
Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression.
Ranked #11 on Referring Video Object Segmentation on Refer-YouTube-VOS
Referring Expression Segmentation Referring Video Object Segmentation +2
1 code implementation • 2 Jul 2022 • Xiaohao Xu, Jinglu Wang, Xiang Ming, Yan Lu
We consolidate this conditional mask calibration process in a progressive manner, where the object representations and proto-masks evolve to be discriminative iteratively.
Ranked #1 on Visual Object Tracking on YouTube-VOS
1 code implementation • 6 Dec 2021 • Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu
We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively.
Ranked #3 on Video Object Segmentation on DAVIS 2017 (test-dev)
no code implementations • 3 Dec 2021 • Xiang Li, Jinglu Wang, Xiao Li, Yan Lu
Based on this representation, we introduce a cropping-free temporal fusion approach to model the temporal consistency between video frames.
no code implementations • 20 Oct 2021 • Xiang Li, Jinglu Wang, Xiao Li, Yan Lu
Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes.
no code implementations • 18 Apr 2021 • Zengyi Qin, Jinglu Wang, Yan Lu
Detecting and localizing objects in the real 3D space, which plays a crucial role in scene understanding, is particularly challenging given only a monocular image due to the geometric information loss during imagery projection.
1 code implementation • 28 Jul 2020 • Zengyi Qin, Jinglu Wang, Yan Lu
A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
2 code implementations • 12 Jun 2020 • Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun
Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.
no code implementations • CVPR 2020 • Mingmin Zhen, Jinglu Wang, Lei Zhou, Shiwei Li, Tianwei Shen, Jiaxiang Shang, Tian Fang, Quan Long
In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection.
1 code implementation • CVPR 2019 • Zengyi Qin, Jinglu Wang, Yan Lu
In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information.
no code implementations • 22 May 2019 • Mingmin Zhen, Jinglu Wang, Lei Zhou, Tian Fang, Long Quan
On the other hand, it learns more efficiently with the more efficient gradient backpropagation.
Ranked #73 on Semantic Segmentation on NYU Depth v2
1 code implementation • 26 Nov 2018 • Zengyi Qin, Jinglu Wang, Yan Lu
We propose MonoGRNet for the amodal 3D object detection from a monocular RGB image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension.
Ranked #26 on Monocular 3D Object Detection on KITTI Cars Moderate
no code implementations • 23 Nov 2018 • Jinglu Wang, Bo Sun, Yan Lu
In this paper, we address the problem of reconstructing an object's surface from a single image using generative networks.
no code implementations • ICCV 2017 • Lei Zhou, Siyu Zhu, Tianwei Shen, Jinglu Wang, Tian Fang, Long Quan
In this paper, we propose a scale-invariant image matching approach to tackling the very large scale variation of views.
no code implementations • 28 Feb 2017 • Siyu Zhu, Tianwei Shen, Lei Zhou, Runze Zhang, Jinglu Wang, Tian Fang, Long Quan
In this paper, we tackle the accurate and consistent Structure from Motion (SfM) problem, in particular camera registration, far exceeding the memory of a single computer in parallel.
no code implementations • ICCV 2015 • Jingbo Liu, Jinglu Wang, Tian Fang, Chiew-Lan Tai, Long Quan
In this paper, we propose a structural segmentation algorithm to partition multi-view stereo reconstructed surfaces of large-scale urban environments into structural segments.