Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

17 Apr 2023  ·  TaeHo Kim, Jong-Min Lee ·

Most invariance-based self-supervised methods rely on single object-centric images (e.g., ImageNet images) for pretraining, learning features that invariant to geometric transformation. However, when images are not object-centric, the semantics of the image can be significantly altered due to cropping. Furthermore, as the model becomes insensitive to geometric transformations, it may struggle to capture location information. For this reason, we propose a Geometric Transformation Sensitive Architecture designed to be sensitive to geometric transformations, specifically focusing on four-fold rotation, random crop, and multi-crop. Our method encourages the student to be sensitive by predicting rotation and using targets that vary with those transformations through pooling and rotating the teacher feature map. Additionally, we use patch correspondence loss to encourage correspondence between patches with similar features. This approach allows us to capture long-term dependencies in a more appropriate way than capturing long-term dependencies by encouraging local-to-global correspondence, which occurs when learning to be insensitive to multi-crop. Our approach demonstrates improved performance when using non-object-centric images as pretraining data compared to other methods that train the model to be insensitive to geometric transformation. We surpass DINO[Caron et al.[2021b]] baseline in tasks including image classification, semantic segmentation, detection, and instance segmentation with improvements of 4.9 $Top-1 Acc$, 3.3 $mIoU$, 3.4 $AP^b$, and 2.7 $AP^m$. Code and pretrained models are publicly available at: https://github.com/bok3948/GTSA

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods