Subformer is a Transformer that combines sandwich-style parameter sharing, which overcomes naive cross-layer parameter sharing in generative models, and self-attentive embedding factorization (SAFE). In SAFE, a small self-attention layer is used to reduce embedding parameter count.
Source: Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative TransformersPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Abstractive Text Summarization | 2 | 20.00% |
Language Modelling | 2 | 20.00% |
Machine Translation | 2 | 20.00% |
Translation | 2 | 20.00% |
Graph Representation Learning | 1 | 10.00% |
Decoder | 1 | 10.00% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |