no code implementations • 6 May 2024 • Li Mi, Xianjie Dai, Javiera Castillo-Navarro, Devis Tuia
For this reason, as a matching-based task, cross-modal text-image retrieval often suffers from information asymmetry between texts and images.
no code implementations • 20 Feb 2024 • Li Mi, Syrielle Montariol, Javiera Castillo-Navarro, Xianjie Dai, Antoine Bosselut, Devis Tuia
However, generating focused questions using textual constraints while enforcing a high relevance to the image content remains a challenge, as VQG systems often ignore one or both forms of grounding.