Label Attention Network for sequential multi-label classification: you were looking at a wrong self-attention

Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focus on transformer architecture for sequential data introducing self-attention for the elements in a sequence. In that case, we take into account events' time interactions but lose information on label inter-dependencies. Motivated by this shortcoming, we propose leveraging a self-attention mechanism over labels preceding the predicted step. As our approach is a Label-Attention NETwork, we call it LANET. Experimental evidence suggests that LANET outperforms the established models' performance and greatly captures interconnections between labels. For example, the micro-AUC of our approach is $0.9536$ compared to $0.7501$ for a vanilla transformer. We provide an implementation of LANET to facilitate its wider usage.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here