zero-shot long video global-mode question answering
1 papers with code • 1 benchmarks • 0 datasets
This task has no description! Would you like to contribute one?
Most implemented papers
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.