no code implementations • 2 Oct 2023 • Qiyu Wu, Mengjie Zhao, Yutong He, Lang Huang, Junya Ono, Hiromi Wakaki, Yuki Mitsufuji
In this paper, we focus on the wide existence of reporting bias in visual-language datasets, embodied as the object-attribute association, which can subsequentially degrade models trained on them.
1 code implementation • 21 Aug 2023 • Mingkai Zheng, Shan You, Lang Huang, Xiu Su, Fei Wang, Chen Qian, Xiaogang Wang, Chang Xu
Moreover, to further boost the performance, we propose ``distributional consistency" as a more informative regularization to enable similar instances to have a similar probability distribution.
2 code implementations • ICCV 2023 • Mingkai Zheng, Shan You, Lang Huang, Chen Luo, Fei Wang, Chen Qian, Chang Xu
Semi-Supervised image classification is one of the most fundamental problem in computer vision, which significantly reduces the need for human labor.
1 code implementation • 12 Jul 2022 • Tao Huang, Lang Huang, Shan You, Fei Wang, Chen Qian, Chang Xu
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional neural networks (CNNs) due to the lack of inductive bias.
1 code implementation • 26 May 2022 • Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, Toshihiko Yamasaki
We present an efficient approach for Masked Image Modeling (MIM) with hierarchical Vision Transformers (ViTs), allowing the hierarchical ViTs to discard masked patches and operate only on the visible ones.
1 code implementation • CVPR 2022 • Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, Toshihiko Yamasaki
In this paper, we present a new approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial information of features, so that the projected embeddings could be exactly aligned and thus guide the feature learning better.
1 code implementation • CVPR 2022 • Mingkai Zheng, Shan You, Lang Huang, Fei Wang, Chen Qian, Chang Xu
Learning with few labeled data has been a longstanding problem in the computer vision and machine learning research community.
2 code implementations • NeurIPS 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.
1 code implementation • 18 Oct 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.
Ranked #3 on Pose Estimation on AIC
2 code implementations • 21 Jan 2021 • Lang Huang, Chao Zhang, Hongyang Zhang
We propose self-adaptive training -- a unified training algorithm that dynamically calibrates and enhances training processes by model predictions without incurring an extra computational cost -- to advance both supervised and self-supervised learning of deep neural networks.
4 code implementations • NeurIPS 2020 • Lang Huang, Chao Zhang, Hongyang Zhang
We propose self-adaptive training---a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost---to improve generalization of deep learning for potentially corrupted training data.
1 code implementation • ICCV 2019 • Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han
On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes.
Ranked #59 on Person Re-Identification on DukeMTMC-reID
6 code implementations • 29 Jul 2019 • Lang Huang, Yuhui Yuan, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang
There are two successive attention modules each estimating a sparse affinity matrix.
8 code implementations • 4 Sep 2018 • Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang
To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.
Ranked #9 on Semantic Segmentation on Trans10K