1 code implementation • 14 Mar 2024 • Zhixuan Shen, Haonan Luo, Sijia Li, Tianrui Li
Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.
Optical Character Recognition Optical Character Recognition (OCR) +2