Open Vocabulary Panoptic Segmentation
7 papers with code • 1 benchmarks • 1 datasets
Most implemented papers
Panoptic Vision-Language Feature Fields
In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.
Extract Free Dense Labels from CLIP
Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.
Open-Vocabulary Universal Image Segmentation with MaskCLIP
In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining.
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
PosSAM: Panoptic Open-vocabulary Segment Anything
In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework.