Cross View Training, or CVT, is a semi-supervised algorithm for training distributed word representations that makes use of unlabelled and labelled examples.
CVT adds $k$ auxiliary prediction modules to the model, a Bi-LSTM encoder, which are used when learning on unlabeled examples. A prediction module is usually a small neural network (e.g., a hidden layer followed by a softmax layer). Each one takes as input an intermediate representation $h^j(x_i)$ produced by the model (e.g., the outputs of one of the LSTMs in a Bi-LSTM model). It outputs a distribution over labels $p_{j}^{\theta}\left(y\mid{x_{i}}\right)$.
Each $h^j$ is chosen such that it only uses a part of the input $x_i$; the particular choice can depend on the task and model architecture. The auxiliary prediction modules are only used during training; the test-time prediction come from the primary prediction module that produces $p_\theta$.
Source: Semi-Supervised Sequence Modeling with Cross-View TrainingPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Sentence | 4 | 15.38% |
Named Entity Recognition (NER) | 2 | 7.69% |
Dependency Parsing | 2 | 7.69% |
Graph Representation Learning | 1 | 3.85% |
Link Prediction | 1 | 3.85% |
Language Modelling | 1 | 3.85% |
Semantic Textual Similarity | 1 | 3.85% |
Sentence Embedding | 1 | 3.85% |
Person Re-Identification | 1 | 3.85% |
Component | Type |
|
---|---|---|
Additive Attention
|
Attention Mechanisms | |
CNN BiLSTM
|
Bidirectional Recurrent Neural Networks | |
Dropout
|
Regularization | |
Softmax
|
Output Functions |