no code implementations • 6 Feb 2024 • Adjorn van Engelenhoven, Nicola Strisciuglio, Estefanía Talavera
The self-attention from within each cluster is then combined with the cluster summaries of other clusters, enabling information flow across the entire input sequence.