no code implementations • 13 Feb 2023 • Hongyu Hè, Marko Kabic
However, Transformers fall short in learning long-range dependencies mainly due to the quadratic complexity scaled with context length, in terms of both time and space.
Image Classification