Causal Unsupervised Semantic Segmentation

11 Oct 2023  ยท  Junho Kim, Byung-Kwan Lee, Yong Man Ro ยท

Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Unsupervised Semantic Segmentation Cityscapes test CAUSE (DINOv2, ViT-B/14) mIoU 29.9 # 1
Accuracy 89.8 # 2
Unsupervised Semantic Segmentation Cityscapes test CAUSE (ViT-B/8) mIoU 28.0 # 2
Accuracy 90.8 # 1
Unsupervised Semantic Segmentation COCO-Stuff-171 CAUSE-TR (ViT-S/8) mIoU 15.2 # 1
Pixel Accuracy 46.6 # 1
Unsupervised Semantic Segmentation COCO-Stuff-27 CAUSE (DINOv2, ViT-B/14) Accuracy 78.0 # 1
mIoU 45.3 # 1
Unsupervised Semantic Segmentation COCO-Stuff-27 CAUSE (ViT-B/8) Accuracy 74.9 # 2
mIoU 41.9 # 2
Unsupervised Semantic Segmentation COCO-Stuff-81 CAUSE-TR (ViT-S/8) mIoU 21.2 # 1
Pixel Accuracy 75.2 # 2
Unsupervised Semantic Segmentation COCO-Stuff-81 CAUSE-MLP (ViT-S/8) mIoU 19.1 # 2
Pixel Accuracy 78.8 # 1
Unsupervised Semantic Segmentation PASCAL VOC 2012 val CAUSE (ViT-B/8) Clustering [mIoU] 53.3 # 2
Unsupervised Semantic Segmentation PASCAL VOC 2012 val CAUSE (iBOT, ViT-B/16) Clustering [mIoU] 53.4 # 1
Unsupervised Semantic Segmentation PASCAL VOC 2012 val CAUSE (DINOv2, ViT-B/14) Clustering [mIoU] 53.2 # 3

Methods


No methods listed for this paper. Add relevant methods here