no code implementations • 23 Apr 2024 • Moyuru Yamada
However, simultaneous control over both global contexts (e. g., object layouts and interactions) and local details (e. g., colors and emotions) still remains a significant challenge.
1 code implementation • 15 Sep 2023 • Amir Rahimi, Vanessa D'Amario, Moyuru Yamada, Kentaro Takemoto, Tomotake Sasaki, Xavier Boix
We demonstrate that this result is independent of the similarity between the training and testing data and applies to well-known families of neural network architectures for VQA (i. e. monolithic architectures and neural module networks).
1 code implementation • 17 May 2023 • Kentaro Takemoto, Moyuru Yamada, Tomotake Sasaki, Hisanao Akima
Human-Object Interaction (HOI) detection is a task to localize humans and objects in an image and predict the interactions in human-object pairs.
Human-Object Interaction Detection Systematic Generalization
no code implementations • 18 Nov 2022 • Moyuru Yamada
We then propose targeted detection task, where detection targets are given by a natural language and the goal of the task is to detect only all the target objects in a given image.
1 code implementation • 27 Jan 2022 • Moyuru Yamada, Vanessa D'Amario, Kentaro Takemoto, Xavier Boix, Tomotake Sasaki
We reveal that Neural Module Networks (NMNs), i. e., question-specific compositions of modules that tackle a sub-task, achieve better or similar systematic generalization performance than the conventional Transformers, even though NMNs' modules are CNN-based.