no code implementations • 6 Aug 2023 • Onkar Susladkar, Prajwal Gatti, Anand Mishra
In this work, we study the task of ``visually" translating scene text from a source language (e. g., English) to a target language (e. g., Chinese).
no code implementations • 16 Oct 2022 • Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani
To enable both commonsense and factual reasoning in the image search, we present a unified framework, namely Knowledge Retrieval-Augmented Multimodal Transformer (KRAMT), that treats the named visual entities in an image as a gateway to encyclopedic knowledge and leverages them along with natural language query to ground relevant knowledge.