TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Semantic Segmentation	INRIA Aerial Image Labeling	SDSC-UNet	IoU	83.01	# 4
Extracting Buildings In Remote Sensing Images	Massachusetts building dataset	SDSC-UNet	IoU	76.71	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sdsc-unet-dual-skip-connection-vit-based-u/extracting-buildings-in-remote-sensing-images-4)](https://paperswithcode.com/sota/extracting-buildings-in-remote-sensing-images-4?p=sdsc-unet-dual-skip-connection-vit-based-u)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sdsc-unet-dual-skip-connection-vit-based-u/semantic-segmentation-on-inria-aerial-image)](https://paperswithcode.com/sota/semantic-segmentation-on-inria-aerial-image?p=sdsc-unet-dual-skip-connection-vit-based-u)`

SDSC-UNet: Dual Skip Connection ViT-based U-shaped Model for Building Extraction

IEEE Geoscience and Remote Sensing Letters 2023 · Renhe Zhang, Qian Zhang, Guixu Zhang ·

Benefiting from effective global information interaction, vision-transformers (ViTs) have been widely used in the building extraction task. However, buildings in remote sensing (RS) images usually differ greatly in size. Mainstream ViT-based segmentation models for RS images are based on Swin Transformer, which lacks multi-scale information inside the ViT block. In addition, they only connect the output of the entire ViT encoder block to the decoder, which ignore the similarity information of the attention maps inside the ViT encoder block, and are unable to provide better global dependencies for the decoder. To solve above problems, we introduce a novel Shunted Transformer, which enables the model to capture multi-scale information internally while fully establishing global dependencies, to build a pure ViT-based U-shaped model for building extraction. Furthermore, unlike the previous single-skip-connection structure of U-shaped methods, we build a novel dual skip connection structure inside the model. It simultaneously transmits the attention maps inside the ViT encoder block and its entire output to the decoder, thereby fully mining the information of the ViT encoder block and providing better global information guidance for the decoder. Thus, our model is named Shunted Dual Skip Connection UNet (SDSC-UNet). We also design a feature fusion module called Dual Skip Upsample Fusion Module (DSUFM) to aggregate the information. Our model has yields state-of-the-art (SOTA) performance (83.02%IoU) on the Inria Aerial Image Labeling Dataset. Code will be available at: https://github.com/stdcoutzrh/BuildingExtraction.

PDF

Code

Add Remove Mark official

stdcoutzrh/BuildingExtraction

Tasks

Add Remove

Decoder

Extracting Buildings In Remote Sensing Images

Semantic Segmentation

Datasets

INRIA Aerial Image Labeling

Massachusetts building dataset

Results from the Paper

Add Remove

Ranked #1 on Extracting Buildings In Remote Sensing Images on Massachusetts building dataset

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	INRIA Aerial Image Labeling	SDSC-UNet	IoU	83.01	# 4	Compare
Extracting Buildings In Remote Sensing Images	Massachusetts building dataset	SDSC-UNet	IoU	76.71	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer • Transformer

Edit Social Preview

SDSC-UNet: Dual Skip Connection ViT-based U-shaped Model for Building Extraction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove