A Switch FFN is a sparse layer that operates independently on tokens within an input sequence. It is shown in the blue block in the figure. We diagram two tokens ($x_{1}$ = “More” and $x_{2}$ = “Parameters” below) being routed (solid lines) across four FFN experts, where the router independently routes each token. The switch FFN layer returns the output of the selected FFN multiplied by the router gate value (dotted-line).
Source: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Machine Translation | 3 | 15.00% |
Language Modelling | 3 | 15.00% |
Question Answering | 3 | 15.00% |
Translation | 2 | 10.00% |
Text Generation | 2 | 10.00% |
Large Language Model | 1 | 5.00% |
Text Classification | 1 | 5.00% |
Reinforcement Learning (RL) | 1 | 5.00% |
Common Sense Reasoning | 1 | 5.00% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |