DiffusionSTR: Diffusion Model for Scene Text Recognition

29 Jun 2023  ·  Masato Fujitake ·

This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Scene Text Recognition CUTE80 DiffusionSTR Accuracy 92.5 # 15
Scene Text Recognition ICDAR2013 DiffusionSTR Accuracy 97.1 # 17
Scene Text Recognition ICDAR2015 DiffusionSTR Accuracy 86 # 13
Scene Text Recognition IIIT5k DiffusionSTR Accuracy 97.3 # 12
Scene Text Recognition SVT DiffusionSTR Accuracy 93.6 # 20
Scene Text Recognition SVTP DiffusionSTR Accuracy 89.2 # 16

Methods