Switch FFN

Introduced by Fedus et al. in Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

A Switch FFN is a sparse layer that operates independently on tokens within an input sequence. It is shown in the blue block in the figure. We diagram two tokens ($x_{1}$ = “More” and $x_{2}$ = “Parameters” below) being routed (solid lines) across four FFN experts, where the router independently routes each token. The switch FFN layer returns the output of the selected FFN multiplied by the router gate value (dotted-line).

Source: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Machine Translation	3	15.00%
Language Modelling	3	15.00%
Question Answering	3	15.00%
Translation	2	10.00%
Text Generation	2	10.00%
Large Language Model	1	5.00%
Text Classification	1	5.00%
Reinforcement Learning (RL)	1	5.00%
Common Sense Reasoning	1	5.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Feedforward Networks