TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	RefCOCOg-test	MagNet	Overall IoU	66.03	# 6
Referring Expression Segmentation	RefCOCOg-val	MagNet	Overall IoU	65.36	# 7
Referring Expression Segmentation	RefCOCO testA	MagNet	Overall IoU	78.24	# 2
Referring Expression Segmentation	RefCOCO testA	MagNet	Overall IoU	78.24	# 6
Referring Expression Segmentation	RefCOCO+ testA	MagNet	Overall IoU	71.32	# 7
Referring Expression Segmentation	RefCOCO testB	MagNet	Overall IoU	71.05	# 5
Referring Expression Segmentation	RefCOCO testB	MagNet	Overall IoU	71.05	# 2
Referring Expression Segmentation	RefCOCO+ test B	MagNet	Overall IoU	58.14	# 7
Referring Expression Segmentation	RefCoCo val	MagNet	Overall IoU	75.24	# 7
Referring Expression Segmentation	RefCoCo val	MagNet	Overall IoU	75.24	# 4
Referring Expression Segmentation	RefCOCO+ val	MagNet	Overall IoU	66.16	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-8)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-8?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-9)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-9?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-7)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-7?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-2)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-2?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcocog-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog-1?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-1?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcocog)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-4)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-4?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-5)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-5?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=mask-grounding-for-referring-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask-grounding-for-referring-image/referring-expression-segmentation-on-refcoco-3)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-3?p=mask-grounding-for-referring-image)`

Mask Grounding for Referring Image Segmentation

19 Dec 2023 · Yong Xien Chng, Henry Zheng, Yizeng Han, Xuchong Qiu, Gao Huang ·

Referring Image Segmentation (RIS) is a challenging task that requires an algorithm to segment objects referred by free-form language expressions. Despite significant progress in recent years, most state-of-the-art (SOTA) methods still suffer from considerable language-image modality gap at the pixel and word level. These methods generally 1) rely on sentence-level language features for language-image alignment and 2) lack explicit training supervision for fine-grained visual grounding. Consequently, they exhibit weak object-level correspondence between visual and language features. Without well-grounded features, prior methods struggle to understand complex expressions that require strong reasoning over relationships among multiple objects, especially when dealing with rarely used or ambiguous clauses. To tackle this challenge, we introduce a novel Mask Grounding auxiliary task that significantly improves visual grounding within language features, by explicitly teaching the model to learn fine-grained correspondence between masked textual tokens and their matching visual objects. Mask Grounding can be directly used on prior RIS methods and consistently bring improvements. Furthermore, to holistically address the modality gap, we also design a cross-modal alignment loss and an accompanying alignment module. These additions work synergistically with Mask Grounding. With all these techniques, our comprehensive approach culminates in MagNet (Mask-grounded Network), an architecture that significantly outperforms prior arts on three key benchmarks (RefCOCO, RefCOCO+ and G-Ref), demonstrating our method's effectiveness in addressing current limitations of RIS algorithms. Our code and pre-trained weights will be released.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Segmentation

Referring Expression Segmentation

Segmentation

Semantic Segmentation

Sentence

Visual Grounding

Datasets

MS COCO

RefCOCO Google Refexp

Results from the Paper

Add Remove

Ranked #2 on Referring Expression Segmentation on RefCOCO testB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	RefCOCOg-test	MagNet	Overall IoU	66.03	# 6	Compare
Referring Expression Segmentation	RefCOCOg-val	MagNet	Overall IoU	65.36	# 7	Compare
Referring Expression Segmentation	RefCOCO testA	MagNet	Overall IoU	78.24	# 2	Compare
Referring Expression Segmentation	RefCOCO testA	MagNet	Overall IoU	78.24	# 6	Compare
Referring Expression Segmentation	RefCOCO+ testA	MagNet	Overall IoU	71.32	# 7	Compare
Referring Expression Segmentation	RefCOCO testB	MagNet	Overall IoU	71.05	# 5	Compare
Referring Expression Segmentation	RefCOCO testB	MagNet	Overall IoU	71.05	# 2	Compare
Referring Expression Segmentation	RefCOCO+ test B	MagNet	Overall IoU	58.14	# 7	Compare
Referring Expression Segmentation	RefCoCo val	MagNet	Overall IoU	75.24	# 7	Compare
Referring Expression Segmentation	RefCoCo val	MagNet	Overall IoU	75.24	# 4	Compare
Referring Expression Segmentation	RefCOCO+ val	MagNet	Overall IoU	66.16	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Mask Grounding for Referring Image Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove