no code implementations • ECCV 2020 • Zetong Yang, Yanan sun, Shu Liu, Xiaojuan Qi, Jiaya Jia
In 3D recognition, to fuse multi-scale structure information, existing methods apply hierarchical frameworks stacked by multiple fusion layers for integrating current relative locations with structure information from the previous level.
no code implementations • 14 Mar 2024 • Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez
This mapping allows the depth estimation of distant objects conditioned on their 2D boxes, making long-range 3D detection with 2D supervision feasible.
1 code implementation • 29 Dec 2023 • Zetong Yang, Li Chen, Yanan sun, Hongyang Li
To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input.
3 code implementations • 28 Dec 2023 • Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, LiMin Wang
Occupancy prediction plays a pivotal role in autonomous driving.
no code implementations • CVPR 2023 • Li Jiang, Zetong Yang, Shaoshuai Shi, Vladislav Golyanik, Dengxin Dai, Bernt Schiele
Masked signal modeling has greatly advanced self-supervised pre-training for language and 2D images.
1 code implementation • CVPR 2022 • Zetong Yang, Li Jiang, Yanan sun, Bernt Schiele, Jiaya Jia
This is achieved by introducing an intermediate representation, i. e., Q-representation, in the querying stage to serve as a bridge between the embedding stage and task heads.
Ranked #7 on Semantic Segmentation on S3DIS
no code implementations • CVPR 2021 • Zetong Yang, Yin Zhou, Zhifeng Chen, Jiquan Ngiam
In this paper, we present 3D-MAN: a 3D multi-frame attention network that effectively aggregates features from multiple perspectives and achieves state-of-the-art performance on Waymo Open Dataset.
2 code implementations • CVPR 2020 • Zetong Yang, Yanan sun, Shu Liu, Jiaya Jia
Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.
no code implementations • ICCV 2019 • Zetong Yang, Yanan sun, Shu Liu, Xiaoyong Shen, Jiaya Jia
We present a new two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD).
no code implementations • 13 Dec 2018 • Zetong Yang, Yanan sun, Shu Liu, Xiaoyong Shen, Jiaya Jia
We present a novel 3D object detection framework, named IPOD, based on raw point cloud.
Ranked #1 on 3D Object Detection on KITTI Pedestrians Easy