no code implementations • 20 Mar 2024 • Diwei Wang, Kun Yuan, Candice Muller, Frédéric Blanc, Nicolas Padoy, Hyewon Seo
Based on a large-scale pre-trained Vision Language Model (VLM), our model learns and improves visual, textual, and numerical representations of patient gait videos, through a collective learning across three distinct modalities: gait videos, class-specific descriptions, and numerical gait parameters.