no code implementations • RepL4NLP (ACL) 2022 • Romain Bielawski, Benjamin Devillers, Tim Van De Cruys, Rufin VanRullen
We compare CLIP’s visual stream against two visually trained networks and CLIP’s textual stream against two linguistically trained networks, as well as multimodal combinations of these networks.
no code implementations • 7 Mar 2024 • Léopold Maytié, Benjamin Devillers, Alexandre Arnold, Rufin VanRullen
First, we train a 'Global Workspace' to exploit information collected about the environment via two input modalities (a visual input, or an attribute vector representing the state of the agent and/or its environment).
1 code implementation • 27 Jun 2023 • Benjamin Devillers, Léopold Maytié, Rufin VanRullen
Recent deep learning models can efficiently combine inputs from different modalities (e. g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or text-to-image generation).
1 code implementation • CoNLL (EMNLP) 2021 • Benjamin Devillers, Bhavin Choksi, Romain Bielawski, Rufin VanRullen
Vision models trained on multimodal datasets can benefit from the wide availability of large image-caption datasets.