TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Retrieval	CIRR	CLIP4Cir (v2)	(Recall@5+Recall_subset@1)/2	69.09	# 8
Image Retrieval	Fashion IQ	CLIP4Cir (v2)	(Recall@10+Recall@50)/2	50.03	# 11
Image Retrieval	LaSCo	CLIP4CIR	Recall@1 (%)	4.01	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conditioned-and-composed-image-retrieval/image-retrieval-on-lasco)](https://paperswithcode.com/sota/image-retrieval-on-lasco?p=conditioned-and-composed-image-retrieval)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conditioned-and-composed-image-retrieval/image-retrieval-on-cirr)](https://paperswithcode.com/sota/image-retrieval-on-cirr?p=conditioned-and-composed-image-retrieval)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conditioned-and-composed-image-retrieval/image-retrieval-on-fashion-iq)](https://paperswithcode.com/sota/image-retrieval-on-fashion-iq?p=conditioned-and-composed-image-retrieval)`

Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features

CVPRW 2022 · Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo ·

In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR), an image is combined with a text that provides information regarding user intentions and is relevant for application domains like e-commerce. The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder. Then in a second training stage, we learn a more complex combiner network that merges visual and textual features. Contrastive learning is used in both stages. The proposed approach obtains state-of-the-art performance for conditioned CBIR on the FashionIQ dataset and for composed CBIR on the more recent CIRR dataset.

PDF Abstract

Code

Add Remove Mark official

abaldrati/clip4cirdemo official

ABaldrati/CLIP4Cir

135

Tasks

Add Remove

Composed Image Retrieval (CoIR)

Content-Based Image Retrieval

Contrastive Learning

Image Retrieval

Retrieval

Datasets

Fashion IQ

CIRR

LaSCo

Results from the Paper

Add Remove

Ranked #3 on Image Retrieval on LaSCo

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Retrieval	CIRR	CLIP4Cir (v2)	(Recall@5+Recall_subset@1)/2	69.09	# 8	Compare
Image Retrieval	Fashion IQ	CLIP4Cir (v2)	(Recall@10+Recall@50)/2	50.03	# 11	Compare
Image Retrieval	LaSCo	CLIP4CIR	Recall@1 (%)	4.01	# 3	Compare

Methods

Add Remove

CLIP • Contrastive Learning

Edit Social Preview

Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove