Search Results for author: Marcella Cornia

Found 43 papers, 24 papers with code

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

no code implementations • 23 Apr 2024 • Davide Caffagni, Federico Cocchi, Nicholas Moratelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Multimodal LLMs are the natural evolution of LLMs, and enlarge their capabilities so as to work beyond the pure textual modality.

Question Answering Retrieval +1

Paper
Add Code

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

no code implementations • 9 Apr 2024 • Luca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Open-vocabulary semantic segmentation aims at segmenting arbitrary categories expressed in textual form.

Open Vocabulary Semantic Segmentation Semantic Segmentation

Paper
Add Code

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

2 code implementations • 21 Mar 2024 • Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body.

Denoising Virtual Try-on

372

Paper
Code

Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images

1 code implementation • 13 Mar 2024 • Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Rita Cucchiara

Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation.

Fake Image Detection Image Generation +1

Paper
Code

Trends, Applications, and Challenges in Human Attention Modelling

1 code implementation • 28 Feb 2024 • Giuseppe Cartella, Marcella Cornia, Vittorio Cuculo, Alessandro D'Amelio, Dario Zanca, Giuseppe Boccignone, Rita Cucchiara

Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling.

Language Modelling

Paper
Code

The (R)Evolution of Multimodal Large Language Models: A Survey

no code implementations • 19 Feb 2024 • Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

Connecting text and visual modalities plays an essential role in generative intelligence.

Image Generation Instruction Following +1

Paper
Add Code

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

1 code implementation • 27 Nov 2023 • Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator.

Cross-Modal Retrieval Image Retrieval +5

Paper
Code

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

1 code implementation • 11 Sep 2023 • Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements.

Contrastive Learning Domain Generalization +2

Paper
Code

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

1 code implementation • ICCV 2023 • Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions.

Decoder Image Captioning

Paper
Code

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

1 code implementation • 12 Jun 2023 • Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Andrea Pilzer, Rita Cucchiara

The use of self-supervised pre-training has emerged as a promising approach to enhance the performance of visual tasks such as image classification.

Image Classification

Paper
Code

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

1 code implementation • 22 May 2023 • Davide Morelli, Alberto Baldrati, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

In this context, image-based virtual try-on, which consists in generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions.

Virtual Try-on

377

Paper
Code

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

1 code implementation • ICCV 2023 • Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner.

Multimodal fashion image editing

372

Paper
Code

Multi-Class Explainable Unlearning for Image Classification via Weight Filtering

no code implementations • 4 Apr 2023 • Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Machine Unlearning has recently been emerging as a paradigm for selectively removing the impact of training datapoints from a network.

Classification Image Classification +1

Paper
Add Code

Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images

1 code implementation • 2 Apr 2023 • Roberto Amoroso, Davide Morelli, Marcella Cornia, Lorenzo Baraldi, Alberto del Bimbo, Rita Cucchiara

Recent advancements in diffusion models have enabled the generation of realistic deepfakes by writing textual prompts in natural language.

Fake Image Detection

Paper
Code

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

1 code implementation • CVPR 2023 • Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures.

Contrastive Learning Image Captioning +1

Paper
Code

Embodied Agents for Efficient Exploration and Smart Scene Description

no code implementations • 17 Jan 2023 • Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments.

Efficient Exploration Image Captioning +1

Paper
Add Code

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

no code implementations • 17 Aug 2022 • Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content.

Handwritten Text Recognition HTR

Paper
Add Code

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

no code implementations • 16 Aug 2022 • Silvia Cascianelli, Vittorio Pippi, Martin Maarand, Marcella Cornia, Lorenzo Baraldi, Christopher Kermorvant, Rita Cucchiara

With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years.

Handwritten Text Recognition HTR

Paper
Add Code

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

1 code implementation • 29 Jul 2022 • Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara

In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts.

Ranked #21 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Image-text matching Retrieval +1

Paper
Code

Retrieval-Augmented Transformer for Image Captioning

no code implementations • 26 Jul 2022 • Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process.

Image Captioning Retrieval

Paper
Add Code

Embodied Navigation at the Art Gallery

no code implementations • 19 Apr 2022 • Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

This feature is challenging for occupancy-based agents which are usually trained in crowded domestic environments with plenty of occupancy information.

Navigate PointGoal Navigation

Paper
Add Code

Dress Code: High-Resolution Multi-Category Virtual Try-On

1 code implementation • 18 Apr 2022 • Davide Morelli, Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, Rita Cucchiara

Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024x768) with front-view, full-body reference models.

Ranked #5 on Virtual Try-on on VITON

Virtual Try-on Vocal Bursts Intensity Prediction

423

Paper
Code

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

no code implementations • 18 Apr 2022 • Federico Landi, Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

To make a step towards this setting, we propose Spot the Difference: a novel task for Embodied AI where the agent has access to an outdated map of the environment and needs to recover the correct layout in a fixed time budget.

Paper
Add Code

CaMEL: Mean Teacher Learning for Image Captioning

1 code implementation • 21 Feb 2022 • Manuele Barraco, Matteo Stefanini, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities.

Image Captioning Knowledge Distillation

Paper
Code

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

no code implementations • 24 Nov 2021 • Marcella Cornia, Lorenzo Baraldi, Giuseppe Fiameni, Rita Cucchiara

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions.

Descriptive Image Captioning +2

Paper
Add Code

Focus on Impact: Indoor Exploration with Intrinsic Motivation

1 code implementation • 14 Sep 2021 • Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

The proposed exploration approach outperforms DRL-based competitors relying on intrinsic rewards and surpasses the agents trained with a dense extrinsic reward computed with the environment layouts.

Paper
Code

Working Memory Connections for LSTM

no code implementations • 31 Aug 2021 • Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

Paper
Add Code

From Show to Tell: A Survey on Deep Learning-based Image Captioning

no code implementations • 14 Jul 2021 • Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara

Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation.

Image Captioning Language Modelling +1

Paper
Add Code

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

no code implementations • 2 Jun 2021 • Marco Cagrandi, Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly.

Image Captioning Language Modelling

Paper
Add Code

Out of the Box: Embodied Navigation in the Real World

1 code implementation • 12 May 2021 • Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

In this work, we detail how to transfer the knowledge acquired in simulation into the real world.

PointGoal Navigation Visual Navigation

Paper
Code

Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis

1 code implementation • 20 Apr 2021 • Samuele Poppi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

As the request for deep learning solutions increases, the need for explainability is even more fundamental.

Attribute Explainable artificial intelligence

Paper
Code

Explore and Explain: Self-supervised Navigation and Recounting

no code implementations • 14 Jul 2020 • Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path.

Navigate

Paper
Add Code

A Novel Attention-based Aggregation Function to Combine Vision and Language

no code implementations • 27 Apr 2020 • Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The joint understanding of vision and language has been recently gaining a lot of attention in both the Computer Vision and Natural Language Processing communities, with the emergence of tasks such as image captioning, image-text matching, and visual question answering.

General Classification Image Captioning +4

Paper
Add Code

Meshed-Memory Transformer for Image Captioning

2 code implementations • CVPR 2020 • Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding.

Ranked #2 on Image Captioning on MS COCO

Image Captioning Machine Translation +2

503

Paper
Code

Multimodal Attention Networks for Low-Level Vision-and-Language Navigation

1 code implementation • 27 Nov 2019 • Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara

Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination.

Vision and Language Navigation

Paper
Code

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

no code implementations • 7 Oct 2019 • Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans.

Text Generation Video Captioning

Paper
Add Code

Artpedia

no code implementations • International Conference on Image Analysis and Processing 2019 • Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Massimiliano Corsini, and Rita Cucchiara

As vision and language techniques are widely applied to realistic images , there is a growing interest in designing visual-semantic models suitable for more complex and challenging scenarios.

Cross-Modal Retrieval Retrieval

Paper
Add Code

M-VAD Names: a Dataset for Video Captioning with Naming

1 code implementation • 4 Mar 2019 • Stefano Pini, Marcella Cornia, Federico Bolelli, Lorenzo Baraldi, Rita Cucchiara

Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" tag.

TAG Video Captioning

Paper
Code

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation

1 code implementation • CVPR 2019 • Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain.

Image-to-Image Translation Translation

Paper
Code

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

1 code implementation • CVPR 2019 • Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior.

controllable image captioning

282

Paper
Code

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

no code implementations • 26 Jun 2017 • Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

Image captioning has been recently gaining a lot of attention thanks to the impressive achievements shown by deep captioning architectures, which combine Convolutional Neural Networks to extract image representations, and Recurrent Neural Networks to generate the corresponding captions.

Ranked #2 on Image Captioning on Flickr30k Captions test (using extra training data)

Image Captioning Saliency Prediction

Paper
Add Code

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

2 code implementations • 29 Nov 2016 • Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

Data-driven saliency has recently gained a lot of attention thanks to the use of Convolutional Neural Networks for predicting gaze fixations.

Saliency Prediction

205

Paper
Code

A Deep Multi-Level Network for Saliency Prediction

2 code implementations • 5 Sep 2016 • Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps.

Saliency Prediction

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.