1 code implementation • CVPR 2023 • Morris Alper, Michael Fiman, Hadar Averbuch-Elor
We show that SOTA multimodally trained text encoders outperform unimodally trained text encoders on the VLU tasks while being underperformed by them on the NLU tasks, lending new context to previously mixed results regarding the NLU capabilities of multimodal models.