The Foes of Neural Network’s Data Efficiency Among Unnecessary Input Dimensions

1 Jan 2021  ·  Vanessa D'Amario, Sanjana Srivastava, Tomotake Sasaki, Xavier Boix ·

Input dimensions are unnecessary for a given task when the target function can be expressed without such dimensions. Object's background in image recognition or redundant sentences in text classification are examples of unnecessary dimensions that are often present in datasets. Deep neural networks achieve remarkable generalization performance despite the presence of unnecessary dimensions but it is unclear whether these dimensions negatively affect neural networks or how. In this paper, we investigate the impact of unnecessary input dimensions on one of the central issues of machine learning: the number of training examples needed to achieve high generalization performance, which we refer to as the network's data efficiency. In a series of analyses with multi-layer perceptrons and deep convolutional neural networks, we show that the network's data efficiency depends on whether the unnecessary dimensions are \emph{task-unrelated} or \emph{task-related} (unnecessary due to redundancy). Namely, we demonstrate that increasing the number of \emph{task-unrelated} dimensions leads to an incorrect inductive bias and as a result degrade the data efficiency, while increasing the number of \emph{task-related} dimensions helps to alleviate the negative impact of the \emph{task-unrelated} dimensions. These results highlight the need for mechanisms that remove \emph{task-unrelated} dimensions, such as crops or foveation for image classification, to enable data efficiency gains.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here