TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 multiple choice	joint-loss	Percentage correct	67.3	# 5
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	joint-loss	Percentage correct	63.2	# 5
Visual Question Answering (VQA)	VQA v1 test-dev	RAU (ResNet)	Accuracy	63.3	# 4
Visual Question Answering (VQA)	VQA v1 test-std	RAU (ResNet)	Accuracy	63.2	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-recurrent-answering-units-with-joint/visual-question-answering-on-vqa-v1-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v1-test-std?p=training-recurrent-answering-units-with-joint)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-recurrent-answering-units-with-joint/visual-question-answering-on-vqa-v1-test-dev)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v1-test-dev?p=training-recurrent-answering-units-with-joint)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-recurrent-answering-units-with-joint/visual-question-answering-on-coco-visual-1)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-1?p=training-recurrent-answering-units-with-joint)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-recurrent-answering-units-with-joint/visual-question-answering-on-coco-visual-4)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-4?p=training-recurrent-answering-units-with-joint)`

Training Recurrent Answering Units with Joint Loss Minimization for VQA

12 Jun 2016 · Hyeonwoo Noh, Bohyung Han ·

We propose a novel algorithm for visual question answering based on a recurrent deep neural network, where every module in the network corresponds to a complete answering unit with attention mechanism by itself. The network is optimized by minimizing loss aggregated from all the units, which share model parameters while receiving different information to compute attention probability. For training, our model attends to a region within image feature map, updates its memory based on the question and attended image feature, and answers the question based on its memory state. This procedure is performed to compute loss in each step. The motivation of this approach is our observation that multi-step inferences are often required to answer questions while each problem may have a unique desirable number of steps, which is difficult to identify in practice. Hence, we always make the first unit in the network solve problems, but allow it to learn the knowledge from the rest of units by backpropagation unless it degrades the model. To implement this idea, we early-stop training each unit as soon as it starts to overfit. Note that, since more complex models tend to overfit on easier questions quickly, the last answering unit in the unfolded recurrent neural network is typically killed first while the first one remains last. We make a single-step prediction for a new question using the shared model. This strategy works better than the other options within our framework since the selected model is trained effectively from all units without overfitting. The proposed algorithm outperforms other multi-step attention based approaches using a single step prediction in VQA dataset.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Question Answering

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

Results from the Paper

Edit

Ranked #2 on Visual Question Answering (VQA) on VQA v1 test-std

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 multiple choice	joint-loss	Percentage correct	67.3	# 5	Compare
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	joint-loss	Percentage correct	63.2	# 5	Compare
Visual Question Answering (VQA)	VQA v1 test-dev	RAU (ResNet)	Accuracy	63.3	# 4	Compare
Visual Question Answering (VQA)	VQA v1 test-std	RAU (ResNet)	Accuracy	63.2	# 2	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Training Recurrent Answering Units with Joint Loss Minimization for VQA

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove