Why does Negative Sampling not Work Well? Analysis of Convexity in Negative Sampling

29 Sep 2021  ·  Hidetaka Kamigaito, Katsuhiko Hayashi ·

A negative sampling (NS) loss function is widely used in various tasks because we can choose an appropriate noise distribution considering properties for a targeting task. In particular, since the NS loss function does not have a normalization term, it is useful for classification problems with a large number of labels to be considered, such as knowledge graph embedding in terms of computational efficiency. On the other hand, properties of the NS loss function that are considered important for learning, such as the relationship between the noise distribution and the number of negative samples, have not been investigated theoretically. By analyzing the gradient of the NS loss function, we show that the NS loss function is non-convex and has a partial convex domain. We investigated the conditions of noise distribution and the number of samples required for efficient learning under this property. As a result, we found that the NS loss function behaves as a convex loss function when our induced conditions are satisfied and combined with a scoring method that handles only non-negative values, which enables efficient learning. Experimental results in FB15k-237, WN18RR, and YAGO3-10 showed that NS loss satisfying the conditions we proposed can improve the performance of KG completion by utilizing TransE and RotatE, which are non-negative scoring methods.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods