PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. The proposed method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, even though it does not require any images for training.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Domain Generalization DomainNet PromptStyler (CLIP, ViT-L/14) Average Accuracy 65.5 # 1
Domain Generalization DomainNet PromptStyler (CLIP, ViT-B/16) Average Accuracy 59.4 # 8
Domain Generalization DomainNet PromptStyler (CLIP, ResNet-50) Average Accuracy 49.5 # 13
Domain Generalization Office-Home PromptStyler (CLIP, ViT-B/16) Average Accuracy 83.6 # 8
Domain Generalization Office-Home PromptStyler (CLIP, ResNet-50) Average Accuracy 73.6 # 16
Domain Generalization Office-Home PromptStyler (CLIP, ViT-L/14) Average Accuracy 89.1 # 1
Domain Generalization PACS PromptStyler (CLIP, ViT-B/16) Average Accuracy 97.2 # 4
Domain Generalization PACS PromptStyler (CLIP, ResNet-50) Average Accuracy 93.2 # 12
Domain Generalization PACS PromptStyler (CLIP, ViT-L/14) Average Accuracy 98.6 # 2
Domain Generalization VLCS PromptStyler (CLIP, ViT-L/14) Average Accuracy 82.4 # 7
Domain Generalization VLCS PromptStyler (CLIP, ViT-B/16) Average Accuracy 82.9 # 4
Domain Generalization VLCS PromptStyler (CLIP, ResNet-50) Average Accuracy 82.3 # 8

Methods