VideoBERT adapts the powerful BERT model to learn a joint visual-linguistic representation for video. It is used in numerous tasks, including action classification and video captioning.
Source: VideoBERT: A Joint Model for Video and Language Representation LearningPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 3 | 18.75% |
Sentence | 1 | 6.25% |
Image Retrieval | 1 | 6.25% |
Question Answering | 1 | 6.25% |
Retrieval | 1 | 6.25% |
Visual Grounding | 1 | 6.25% |
Visual Question Answering | 1 | 6.25% |
Visual Question Answering (VQA) | 1 | 6.25% |
Action Classification | 1 | 6.25% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |