GEMSEC: Graph Embedding with Self Clustering

ASONAM 2019  ·  Benedek Rozemberczki, Ryan Davies, Rik Sarkar, Charles Sutton ·

Modern graph embedding procedures can efficiently extract features of nodes from graphs with millions of nodes. The features are later used as inputs for downstream predictive tasks. In this paper we propose GEMSEC a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their features. The procedure places nodes in an abstract feature space where the vertex features minimize the negative log likelihood of preserving sampled vertex neighborhoods, while the nodes are clustered into a fixed number of groups in this space. GEMSEC is a general extension of earlier work in the domain as it is an augmentation of the core optimization problem of sequence based graph embedding procedures and is agnostic of the neighborhood sampling strategy. We show that GEMSEC extracts high quality clusters on real world social networks and is competitive with other community detection algorithms. We demonstrate that the clustering constraint has a positive effect on representation quality and also that our procedure learns to embed and cluster graphs jointly in a robust and scalable manner.

