Search Results for author: Zhicheng Yan

Found 25 papers, 14 papers with code

EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding

1 code implementation • ICCV 2023 • Chenchen Zhu, Fanyi Xiao, Andres Alvarado, Yasmine Babaei, Jiabo Hu, Hichem El-Mohri, Sean Chang Culatana, Roshan Sumbaly, Zhicheng Yan

To bootstrap the research on EgoObjects, we present a suite of 4 benchmark tasks around the egocentric object understanding, including a novel instance level- and the classical category level object detection.

Continual Learning Object +2

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

Going Denser with Open-Vocabulary Part Segmentation

2 code implementations • ICCV 2023 • Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan

In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.

Object object-detection +3

363

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only

no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny

Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

3rd Continual Learning Workshop Challenge on Egocentric Category and Instance Level Object Understanding

1 code implementation • 13 Dec 2022 • Lorenzo Pellegrini, Chenchen Zhu, Fanyi Xiao, Zhicheng Yan, Antonio Carta, Matthias De Lange, Vincenzo Lomonaco, Roshan Sumbaly, Pau Rodriguez, David Vazquez

Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community.

Continual Learning Incremental Learning +3

Paper
Code

Unified Transformer Tracker for Object Tracking

1 code implementation • CVPR 2022 • Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

Paper
Code

Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search

no code implementations • 9 Dec 2021 • Yifan Jiang, Xinyu Gong, Junru Wu, Humphrey Shi, Zhicheng Yan, Zhangyang Wang

Efficient video architecture is the key to deploying video recognition systems on devices with limited computing resources.

Neural Architecture Search Video Recognition +1

Paper
Add Code

NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training

1 code implementation • ICLR 2022 • Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra

In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.

Ranked #7 on Neural Architecture Search on ImageNet

Data Augmentation Image Classification +2

Paper
Code

Searching for Two-Stream Models in Multivariate Space for Video Recognition

no code implementations • ICCV 2021 • Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan

We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.

Neural Architecture Search Video Recognition +1

Paper
Add Code

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

1 code implementation • 26 Aug 2021 • Wuyang Chen, Xinyu Gong, Junru Wu, Yunchao Wei, Humphrey Shi, Zhicheng Yan, Yi Yang, Zhangyang Wang

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation.

Neural Architecture Search

Paper
Code

Multiscale Vision Transformers

7 code implementations • ICCV 2021 • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

Ranked #14 on Action Classification on Charades

Action Classification Action Recognition +2

6,301

Paper
Code

Visual Transformers: Where Do Transformers Really Belong in Vision Models?

no code implementations • ICCV 2021 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph E. Gonzalez, Kurt Keutzer, Peter Vajda

A recent trend in computer vision is to replace convolutions with transformers.

Semantic Segmentation

Paper
Add Code

FP-NAS: Fast Probabilistic Neural Architecture Search

no code implementations • CVPR 2021 • Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli

Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.

Neural Architecture Search

Paper
Add Code

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

8 code implementations • 5 Jun 2020 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships.

General Classification Image Classification +1

178

Paper
Code

Decoupling Representation and Classifier for Long-Tailed Recognition

4 code implementations • ICLR 2020 • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.

Ranked #3 on Long-tail learning with class descriptors on CUB-LT

Classification General Classification +3

923

Paper
Code

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

no code implementations • 19 Jul 2019 • Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.

Benchmarking Motion Estimation +1

Paper
Add Code

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

28 code implementations • ICCV 2019 • Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

Ranked #147 on Action Classification on Kinetics-400

Action Classification Image Classification +1

2,924

Paper
Code

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.

Ranked #1 on Action Recognition on UCF-101

Action Classification Action Recognition In Videos +3

Paper
Add Code

Graph-Based Global Reasoning Networks

9 code implementations • CVPR 2019 • Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis

In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.

Action Classification Action Recognition +4

335

Paper
Code

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

2 code implementations • ICCV 2019 • Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan

This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.

Ranked #10 on Temporal Action Localization on HACS

Action Classification Action Recognition +3

184

Paper
Code

Learning Concept Taxonomies from Multi-modal Data

no code implementations • ACL 2016 • Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, Eric P. Xing

We study the problem of automatically building hypernym taxonomies from textual and visual data.

Feature Engineering

Paper
Add Code

Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation

no code implementations • 15 Mar 2016 • Zhicheng Yan, Hao Zhang, Yangqing Jia, Thomas Breuel, Yizhou Yu

State-of-the-art results of semantic segmentation are established by Fully Convolutional neural Networks (FCNs).

Semantic Segmentation

Paper
Add Code

HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition

no code implementations • ICCV 2015 • Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, Yizhou Yu

In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy.

Image Classification Object Recognition

Paper
Add Code

Automatic Photo Adjustment Using Deep Neural Networks

1 code implementation • 24 Dec 2014 • Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, Yizhou Yu

Many photographic styles rely on subtle adjustments that depend on the image content and even its semantics.

Photo Retouching

Paper
Code

HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition

4 code implementations • 3 Oct 2014 • Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, Yizhou Yu

In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy.

Ranked #174 on Image Classification on CIFAR-100

Image Classification Object Recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.