Search Results for author: Marco Bertini

Found 28 papers, 17 papers with code

Task-conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery

1 code implementation • ECCV 2020 • My Kieu, Andrew D. Bagdanov, Marco Bertini, Alberto del Bimbo

Despite its broad application and interest, it remains a challenging problem in part due to the vast range of conditions under which it must be robust.

Domain Adaptation Pedestrian Detection

Paper
Code

iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

2 code implementations • 5 May 2024 • Lorenzo Agnolucci, Alberto Baldrati, Marco Bertini, Alberto del Bimbo

Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption.

Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on ImageNet-R

Benchmarking Retrieval +1

117

Paper
Code

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

2 code implementations • 21 Mar 2024 • Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body.

Denoising Virtual Try-on

373

Paper
Code

Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment

1 code implementation • 17 Mar 2024 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini

In particular, we introduce a quality-aware image-text alignment strategy to make CLIP generate representations that correlate with the inherent quality of the images.

Blind Image Quality Assessment No-Reference Image Quality Assessment +1

Paper
Code

Perceptual Quality Improvement in Videoconferencing using Keyframes-based GAN

1 code implementation • 7 Nov 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo

Given that, in this context, the speaker is typically in front of the camera and remains the same for the entire duration of the transmission, we can maintain a set of reference keyframes of the person from the higher-quality I-frames that are transmitted within the video stream and exploit them to guide the visual quality improvement; a novel aspect of this approach is the update policy that maintains and updates a compact and effective set of reference keyframes.

Video Compression

Paper
Code

Restoration of Analog Videos Using Swin-UNet

1 code implementation • 7 Nov 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo

In this paper, we present a system to restore analog videos of historical archives.

Ranked #2 on Analog Video Restoration on TAPE

Analog Video Restoration

Paper
Code

ARNIQA: Learning Distortion Manifold for Image Quality Assessment

1 code implementation • 20 Oct 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo

In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsic manner.

Ranked #2 on No-Reference Image Quality Assessment on CSIQ

Blind Image Quality Assessment No-Reference Image Quality Assessment +1

Paper
Code

Reference-based Restoration of Digitized Analog Videotapes

2 code implementations • 20 Oct 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo

We design a transformer-based Swin-UNet network that exploits both neighboring and reference frames via our Multi-Reference Spatial Feature Fusion (MRSFF) blocks.

Ranked #1 on Analog Video Restoration on TAPE

Analog Video Restoration Artifact Detection

Paper
Code

Mapping Memes to Words for Multimodal Hateful Meme Classification

1 code implementation • 12 Oct 2023 • Giovanni Burbi, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo

Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions.

Ranked #1 on Hateful Meme Classification on HarMeme

Hateful Meme Classification Language Modelling

Paper
Code

Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval

no code implementations • 21 Sep 2023 • Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

Given the recent advances in multimodal image pretraining where visual models trained with semantically dense textual supervision tend to have better generalization capabilities than those trained using categorical attributes or through unsupervised techniques, in this work we investigate how recent CLIP model can be applied in several tasks in artwork domain.

Retrieval Zero-Shot Learning

Paper
Add Code

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

1 code implementation • 11 Sep 2023 • Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements.

Contrastive Learning Domain Generalization +2

Paper
Code

Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

1 code implementation • 22 Aug 2023 • Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption.

Ranked #6 on Image Retrieval on CIRR

Contrastive Learning Image Retrieval +1

137

Paper
Code

ECO: Ensembling Context Optimization for Vision-Language Models

no code implementations • 26 Jul 2023 • Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto del Bimbo

Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space.

Classification Image Classification

Paper
Add Code

4DSR-GCN: 4D Video Point Cloud Upsampling using Graph Convolutional Networks

no code implementations • 1 Jun 2023 • Lorenzo Berlincioni, Stefano Berretti, Marco Bertini, Alberto del Bimbo

Time varying sequences of 3D point clouds, or 4D point clouds, are now being acquired at an increasing pace in several applications (e. g., LiDAR in autonomous or assisted driving).

Edge-computing Graph Attention +1

Paper
Add Code

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

1 code implementation • 22 May 2023 • Davide Morelli, Alberto Baldrati, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

In this context, image-based virtual try-on, which consists in generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions.

Virtual Try-on

378

Paper
Code

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

1 code implementation • ICCV 2023 • Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner.

Multimodal fashion image editing

373

Paper
Code

Error assessment of microwave holography inversion for shallow buried objects

no code implementations • 27 Mar 2023 • Emanuele Vivoli, Luca Bossi, Marco Bertini, Pierluigi Falorni, Lorenzo Capineri

Holographic imaging is a technique that uses microwave energy to create a three-dimensional image of an object or scene.

Paper
Add Code

Zero-Shot Composed Image Retrieval with Textual Inversion

2 code implementations • ICCV 2023 • Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images.

Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on FashionIQ

Retrieval Zero-Shot Composed Image Retrieval (ZS-CIR)

117

Paper
Code

Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features

2 code implementations • CVPRW 2022 • Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder.

Ranked #3 on Image Retrieval on LaSCo

Composed Image Retrieval (CoIR) Content-Based Image Retrieval +2

137

Paper
Code

Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features

2 code implementations • CVPR 2022 • Alberto Baldrati, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

the visual content of the query image.

Ranked #9 on Image Retrieval on CIRR

Composed Image Retrieval (CoIR) Contrastive Learning +1

137

Paper
Code

Partially fake it till you make it: mixing real and fake thermal images for improved object detection

no code implementations • 25 Jun 2021 • Francesco Bongini, Lorenzo Berlincioni, Marco Bertini, Alberto del Bimbo

In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes.

Data Augmentation object-detection +1

Paper
Add Code

Robust pedestrian detection in thermal imagery using synthesized images

no code implementations • 3 Feb 2021 • My Kieu, Lorenzo Berlincioni, Leonardo Galteri, Marco Bertini, Andrew D. Bagdanov, Alberto del Bimbo

Experimental results demonstrate the effectiveness of our approach: using less than 50\% of available real thermal training data, and relying on synthesized data generated by our model in the domain adaptation phase, our detector achieves state-of-the-art results on the KAIST Multispectral Pedestrian Detection Benchmark; even if more real thermal data is available adding GAN generated images to the training data results in improved performance, thus showing that these images act as an effective form of data augmentation.

Data Augmentation Domain Adaptation +2

Paper
Add Code

Inner Eye Canthus Localization for Human Body Temperature Screening

no code implementations • 27 Aug 2020 • Claudio Ferrari, Lorenzo Berlincioni, Marco Bertini, Alberto del Bimbo

As additional contribution, we enrich the original dataset by using the annotated landmarks to deform and project the 3DMM onto the images.

Face Model

Paper
Add Code

Image Retrieval using Multi-scale CNN Features Pooling

no code implementations • 21 Apr 2020 • Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto del Bimbo

In this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network.

Image Retrieval Retrieval

Paper
Add Code

Deep Generative Adversarial Compression Artifact Removal

no code implementations • ICCV 2017 • Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Alberto del Bimbo

Moreover we show that our approach can be used as a pre-processing step for object detection in case images are degraded by compression to a point that state-of-the art detectors fail.

object-detection Object Detection +1

Paper
Add Code

Compact Hash Codes for Efficient Visual Descriptors Retrieval in Large Scale Databases

no code implementations • 10 May 2016 • Simone Ercoli, Marco Bertini, Alberto del Bimbo

In this paper we present an efficient method for visual descriptors retrieval based on compact hash codes computed using a multiple k-means assignment.

Retrieval

Paper
Add Code

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

1 code implementation • 28 Mar 2015 • Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, Alberto del Bimbo

Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image.

Content-Based Image Retrieval Retrieval +1

Paper
Code

A Data-Driven Approach for Tag Refinement and Localization in Web Videos

no code implementations • 2 Jul 2014 • Lamberto Ballan, Marco Bertini, Giuseppe Serra, Alberto del Bimbo

Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing.

TAG

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.