no code implementations • ACL 2022 • Shumpei Miyawaki, Taku Hasegawa, Kyosuke Nishida, Takuma Kato, Jun Suzuki
We tackle the tasks of image and text retrieval using a dual-encoder model in which images and text are encoded independently.
1 code implementation • 12 Jan 2023 • Ryota Tanaka, Kyosuke Nishida, Kosuke Nishida, Taku Hasegawa, Itsumi Saito, Kuniko Saito
Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently.