SortCut Sinkhorn Attention

Introduced by Tay et al. in Sparse Sinkhorn Attention

SortCut Sinkhorn Attention is a variant of Sparse Sinkhorn Attention where a post-sorting truncation of the input sequence is performed, essentially performing a hard top-k operation on the input sequence blocks within the computational graph. While most attention models mainly re-weight or assign near-zero weights during training, this allows for explicitly and dynamically truncate the input sequence. Specifically:

$$ Y = \text{Softmax}\left(Q{\psi_{S}}\left(K\right)^{T}_{\left[:n\right]}\right)\psi_{S}\left(V\right)_{\left[:n\right]} $$

where $n$ is the Sortfut budget hyperparameter.

Source: Sparse Sinkhorn Attention

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Document Classification	1	25.00%
Image Generation	1	25.00%
Language Modelling	1	25.00%
Natural Language Inference	1	25.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Feedforward Network	Feedforward Networks
ReLU	Activation Functions
Softmax	Output Functions

Categories

Add Remove

Attention Mechanisms