Positional Encoding Generator

Introduced by Chu et al. in Conditional Positional Encodings for Vision Transformers

Positional Encoding Generator, or PEG, is a module used in the Conditional Position Encoding position embeddings. It dynamically produce the positional encodings conditioned on the local neighborhood of an input token. To condition on the local neighbors, we first reshape the flattened input sequence $X \in \mathbb{R}^{B \times N \times C}$ of DeiT back to $X^{\prime} \in \mathbb{R}^{B \times H \times W \times C}$ in the 2 -D image space. Then, a function (denoted by $\mathcal{F}$ in the Figure) is repeatedly applied to the local patch in $X^{\prime}$ to produce the conditional positional encodings $E^{B \times H \times W \times C} .$ PEG can be efficiently implemented with a 2-D convolution with kernel $k(k \geq 3)$ and $\frac{k-1}{2}$ zero paddings. Note that the zero paddings here are important to make the model be aware of the absolute positions, and $\mathcal{F}$ can be of various forms such as separable convolutions and many others.

Source: Conditional Positional Encodings for Vision Transformers

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Semantic Segmentation	2	22.22%
Image Classification	2	22.22%
Instance Segmentation	1	11.11%
Novel View Synthesis	1	11.11%
Classification	1	11.11%
General Classification	1	11.11%
Translation	1	11.11%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Depthwise Convolution	Convolutions

Categories

Add Remove

Miscellaneous Components