no code implementations • 5 Apr 2024 • Xiaocheng Luo, Yanping Chen, Ruixue Tang, Ruizhang Huang, Yongbin Qin
In this paper, based on a two-dimensional sentence representation, a bi-consolidating model is proposed to address this problem by simultaneously reinforcing the local and global semantic features relevant to a relation triple.
no code implementations • 10 Oct 2020 • Ruixue Tang, Chao Ma
There are two main lines of research on visual question answering (VQA): compositional model with explicit multi-hop reasoning, and monolithic network with implicit reasoning in the latent feature space.
1 code implementation • ECCV 2020 • Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, Xiaokang Yang
However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure -- an $\langle image, question, answer\rangle$ triplet needs to be maintained correctly.