no code implementations • 19 Mar 2024 • Wendi Yang, Zihang Jiang, Shang Zhao, S. Kevin Zhou
With the recent advancements in single-image-based human mesh recovery, there is a growing interest in enhancing its performance in certain extreme scenarios, such as occlusion, while maintaining overall model accuracy.
Ranked #8 on 3D Human Pose Estimation on 3DPW
1 code implementation • 27 Feb 2024 • Haoran Lai, Qingsong Yao, Zihang Jiang, Rongsheng Wang, ZhiYang He, Xiaodong Tao, S. Kevin Zhou
The advancement of Zero-Shot Learning in the medical domain has been driven forward by using pre-trained models on large-scale image-text pairs, focusing on image-text alignment.
1 code implementation • 20 Dec 2023 • Rongsheng Wang, Qingsong Yao, Haoran Lai, ZhiYang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou
Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent entity-specific context within radiology reports and the complex cross-modality contextual relationships between text and images.
1 code implementation • ICCV 2023 • Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin Zuo, Michael Bi Mi, Xinchao Wang
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
no code implementations • CVPR 2023 • Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, WanChun Ma, Jiashi Feng, Linjie Luo
We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses.
no code implementations • 24 Mar 2023 • Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo
Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings.
1 code implementation • 26 Nov 2022 • Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng
Specifically, we decompose the generative 3D human synthesis into pose-guided mapping and canonical representation with predefined human pose and shape, such that the canonical representation can be explicitly driven to different poses and shapes with the guidance of a 3D parametric human model SMPL.
1 code implementation • 1 Aug 2022 • Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng
Unsupervised generation of clothed virtual humans with various appearance and animatable poses is important for creating 3D human avatars and other AR/VR applications.
1 code implementation • CVPR 2022 • Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip Torr, Song Bai, Vincent Y. F. Tan
Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance.
7 code implementations • 24 Jun 2021 • Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan
Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.
Ranked #1 on Image Classification on VizWiz-Classification
4 code implementations • 23 Jun 2021 • Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng
By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.
1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng
In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #175 on Image Classification on ImageNet
6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng
In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).
Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)
5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng
In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.
Ranked #427 on Image Classification on ImageNet
13 code implementations • ICCV 2021 • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan
To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.
Ranked #404 on Image Classification on ImageNet