Learning dynamic and hierarchical traffic spatiotemporal features with Transformer

12 Apr 2021 · Haoyang Yan, Xiaolei Ma ·

Traffic forecasting is an indispensable part of Intelligent transportation systems (ITS), and long-term network-wide accurate traffic speed forecasting is one of the most challenging tasks. Recently, deep learning methods have become popular in this domain. As traffic data are physically associated with road networks, most proposed models treat it as a spatiotemporal graph modeling problem and use Graph Convolution Network (GCN) based methods. These GCN-based models highly depend on a predefined and fixed adjacent matrix to reflect the spatial dependency. However, the predefined fixed adjacent matrix is limited in reflecting the actual dependence of traffic flow. This paper proposes a novel model, Traffic Transformer, for spatial-temporal graph modeling and long-term traffic forecasting to overcome these limitations. Transformer is the most popular framework in Natural Language Processing (NLP). And by adapting it to the spatiotemporal problem, Traffic Transformer hierarchically extracts spatiotemporal features through data dynamically by multi-head attention and masked multi-head attention mechanism, and fuse these features for traffic forecasting. Furthermore, analyzing the attention weight matrixes can find the influential part of road networks, allowing us to learn the traffic networks better. Experimental results on the public traffic network datasets and real-world traffic network datasets generated by ourselves demonstrate our proposed model achieves better performance than the state-of-the-art ones.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Learning dynamic and hierarchical traffic spatiotemporal features with Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove