Transitivity-Preserving Graph Representation Learning for Bridging Local Connectivity and Role-based Similarity

18 Aug 2023  ·  Van Thuy Hoang, O-Joun Lee ·

Graph representation learning (GRL) methods, such as graph neural networks and graph transformer models, have been successfully used to analyze graph-structured data, mainly focusing on node classification and link prediction tasks. However, the existing studies mostly only consider local connectivity while ignoring long-range connectivity and the roles of nodes. In this paper, we propose Unified Graph Transformer Networks (UGT) that effectively integrate local and global structural information into fixed-length vector representations. First, UGT learns local structure by identifying the local substructures and aggregating features of the $k$-hop neighborhoods of each node. Second, we construct virtual edges, bridging distant nodes with structural similarity to capture the long-range dependencies. Third, UGT learns unified representations through self-attention, encoding structural distance and $p$-step transition probability between node pairs. Furthermore, we propose a self-supervised learning task that effectively learns transition probability to fuse local and global structural features, which could then be transferred to other downstream tasks. Experimental results on real-world benchmark datasets over various downstream tasks showed that UGT significantly outperformed baselines that consist of state-of-the-art models. In addition, UGT reaches the expressive power of the third-order Weisfeiler-Lehman isomorphism test (3d-WL) in distinguishing non-isomorphic graph pairs. The source code is available at https://github.com/NSLab-CUK/Unified-Graph-Transformer.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Clustering Actor UGT Modularity 0.50 # 1
Conductance 0.28 # 1
Node Classification Brazil Air-Traffic UGT Accuracy 0.8 ± 0.05 # 1
Node Clustering Brazil Air-Traffic UGT Conductance 0.51 # 1
Modularity 0.22 # 1
Node Clustering Chameleon UGT Modularity 0.66 # 1
Conductance 0.11 # 1
Node Classification Chameleon UGT Accuracy 69.78 ±3.21 # 26
Node Clustering Citeseer UGT Modularity 0.78 # 2
Conductance 0.04 # 1
Node Classification Citeseer UGT Accuracy 76.08±2.5 # 17
Node Clustering Cora UGT Modularity 0.76 # 1
Conductance 0.09 # 1
Node Classification Cora UGT Accuracy 88.74±0.6% # 7
Node Clustering Cornell UGT Conductance 0.28 # 1
Modularity 0.47 # 1
Node Classification Cornell UGT Accuracy 70.0 ±4.44 # 44
Graph Classification ENZYMES UGT Accuracy 67.22±3.92 # 12
Node Classification Europe Air-Traffic UGT Accuracy 56.92 ±6.36 # 1
Node Clustering Europe Air-Traffic UGT Conductance 0.51 # 1
Modularity 0.20 # 1
Node Classification Film (60%/20%/20% random splits) UGT 1:1 Accuracy 36.84±0.62 # 23
Graph Classification NCI1 UGT Accuracy 77.55 ±0.16% # 33
Graph Classification NCI109 UGT Accuracy 75.45±1.26 # 17
Graph Classification PROTEINS UGT Accuracy 80.12 ±0.32 # 8
Node Classification Squirrel UGT Accuracy 66.96 ±2.49 # 14
Node Clustering Texas UGT Conductance 0.33 # 1
Modularity 0.46 # 1
Node Classification Texas UGT Accuracy 86.67 ±8.31 # 14
Node Classification USA Air-Traffic UGT Accuracy 66.22±4.55 # 1
Node Clustering USAir UGT Modularity 0.30 # 1
Conductance 0.34 # 1
Node Clustering Wiki Squirrel UGT Conductance 0.21 # 1
Modularity 0.74 # 1
Node Clustering Wisconsin UGT Conductance 0.27 # 1
Modularity 0.52 # 1
Node Classification Wisconsin UGT Accuracy 81.6 ±8.24 # 41

Methods