1 code implementation • ICCV 2023 • Chenchen Zhu, Fanyi Xiao, Andres Alvarado, Yasmine Babaei, Jiabo Hu, Hichem El-Mohri, Sean Chang Culatana, Roshan Sumbaly, Zhicheng Yan
To bootstrap the research on EgoObjects, we present a suite of 4 benchmark tasks around the egocentric object understanding, including a novel instance level- and the classical category level object detection.
no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana
Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.
2 code implementations • ICCV 2023 • Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan
In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.
no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny
Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.
1 code implementation • 13 Dec 2022 • Lorenzo Pellegrini, Chenchen Zhu, Fanyi Xiao, Zhicheng Yan, Antonio Carta, Matthias De Lange, Vincenzo Lomonaco, Roshan Sumbaly, Pau Rodriguez, David Vazquez
Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community.
1 code implementation • CVPR 2022 • Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan
Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.
no code implementations • 9 Dec 2021 • Yifan Jiang, Xinyu Gong, Junru Wu, Humphrey Shi, Zhicheng Yan, Zhangyang Wang
Efficient video architecture is the key to deploying video recognition systems on devices with limited computing resources.
1 code implementation • ICLR 2022 • Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra
In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.
Ranked #7 on Neural Architecture Search on ImageNet
no code implementations • ICCV 2021 • Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan
We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.
1 code implementation • 26 Aug 2021 • Wuyang Chen, Xinyu Gong, Junru Wu, Yunchao Wei, Humphrey Shi, Zhicheng Yan, Yi Yang, Zhangyang Wang
This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation.
7 code implementations • ICCV 2021 • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer
We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.
Ranked #14 on Action Classification on Charades
no code implementations • ICCV 2021 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph E. Gonzalez, Kurt Keutzer, Peter Vajda
A recent trend in computer vision is to replace convolutions with transformers.
no code implementations • CVPR 2021 • Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli
Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.
8 code implementations • 5 Jun 2020 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda
In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships.
4 code implementations • ICLR 2020 • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.
Ranked #3 on Long-tail learning with class descriptors on CUB-LT
no code implementations • 19 Jul 2019 • Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani
However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.
28 code implementations • ICCV 2019 • Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng
Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.
Ranked #147 on Action Classification on Kinetics-400
no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.
Ranked #1 on Action Recognition on UCF-101
9 code implementations • CVPR 2019 • Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis
In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.
2 code implementations • ICCV 2019 • Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan
This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.
Ranked #10 on Temporal Action Localization on HACS
no code implementations • ACL 2016 • Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, Eric P. Xing
We study the problem of automatically building hypernym taxonomies from textual and visual data.
no code implementations • 15 Mar 2016 • Zhicheng Yan, Hao Zhang, Yangqing Jia, Thomas Breuel, Yizhou Yu
State-of-the-art results of semantic segmentation are established by Fully Convolutional neural Networks (FCNs).
no code implementations • ICCV 2015 • Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, Yizhou Yu
In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy.
1 code implementation • 24 Dec 2014 • Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, Yizhou Yu
Many photographic styles rely on subtle adjustments that depend on the image content and even its semantics.
4 code implementations • 3 Oct 2014 • Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, Yizhou Yu
In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy.
Ranked #174 on Image Classification on CIFAR-100