1 code implementation • 3 Mar 2024 • Heegon Jin, Seonil Son, Jemin Park, Youngseok Kim, Hyungjong Noh, Yeonsoo Lee
The Attention Alignment Module in A2D performs a dense head-by-head comparison between student and teacher attention heads across layers, turning the combinatorial mapping heuristics into a learning problem.