no code implementations • 4 Oct 2022 • Omri Raccah, Phoebe Chen, Ted L. Willke, David Poeppel, Vy A. Vo
The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations.