Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition

ICMR 2021  ·  Zilong Fu, Guoqing Jin, Hongtao Xie, Junbo Guo ·

Nowadays, it is a trend that using a parallel-decoupled encoderdecoder (PDED) framework in scene text recognition for its flexibility and efficiency. However, due to the inconsistent information content between queries and keys in the parallel positional attention module (PPAM) used in this kind of framework(queries: position information, keys: context and position information), visual misalignment tends to appear when confronting hard samples(e.g., blurred texts, irregular texts, or low-quality images). To tackle this issue, in this paper, we propose a dual parallel attention network (DPAN), in which a newly designed parallel context attention module (PCAM) is cascaded with the original PPAM, using linguistic contextual information to compensate for the information inconsistency between queries and keys. Specifically, in PCAM, we take the visual features from PPAM as inputs and present a bidirectional language model to enhance them with linguistic contexts to produce queries. In this way, we make the information content of the queries and keys consistent in PCAM, which helps to generate more precise visual glimpses to improve the entire PDED framework’s accuracy and robustness. Experimental results verify the effectiveness of the proposed PCAM, showing the necessity of keeping the information consistency between queries and keys in the attention mechanism. On six benchmarks, including regular text and irregular text, the performance of DPAN surpasses the existing leading methods by large margins, achieving new state-of-the-art performance. The code is available on https://github.com/Jackandrome/DPAN.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Scene Text Recognition CUTE80 DPAN Accuracy 91.9 # 17
Scene Text Recognition ICDAR2013 DPAN Accuracy 97.7 # 13
Scene Text Recognition ICDAR2015 DPAN Accuracy 85.5 # 14
Scene Text Recognition IIIT5k DPAN Accuracy 96.2 # 17
Scene Text Recognition SVT DPAN Accuracy 93.9 # 18
Scene Text Recognition SVTP DPAN Accuracy 89.0 # 17

Methods


No methods listed for this paper. Add relevant methods here