1 code implementation • 24 Mar 2023 • Kastan Day, Daniel Christl, Rohan Salvi, Pranav Sriram
Our backbone, based on a reference Flan-T5-11B architecture, learns a universal representation of the video that is a non-linear sum of the encoder models.
Causal Language Modeling Language Modelling