Zero-Shot Semantic Segmentation
17 papers with code • 4 benchmarks • 3 datasets
Most implemented papers
Zero-Shot Semantic Segmentation
Semantic segmentation models are limited in their ability to scale to large numbers of object classes.
Context-aware Feature Generation for Zero-shot Semantic Segmentation
In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet.
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.
Learning unbiased zero-shot semantic segmentation networks via transductive transfer
Our method assumes that both the source images with full pixel-level labels and unlabeled target images are available during training.
From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation
Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories.
Extract Free Dense Labels from CLIP
Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.
Decoupling Zero-Shot Semantic Segmentation
2) a zero-shot classification task on segments.
Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts.
ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme.
Zero-Shot Point Cloud Segmentation by Semantic-Visual Aware Synthesis
Given only the class-level semantic information for unseen objects, we strive to enhance the correspondence, alignment and consistency between the visual and semantic spaces, to synthesise diverse, generic and transferable visual features.