Long Term Action Anticipation
6 papers with code • 1 benchmarks • 1 datasets
Most implemented papers
Video + CLIP Baseline for Ego4D Long-term Action Anticipation
The CLIP embedding provides fine-grained understanding of objects relevant for an action whereas the slowfast network is responsible for modeling temporal information within a video clip of few frames.
Intention-Conditioned Long-Term Human Egocentric Action Forecasting
Our framework first extracts two level of human information over the N observed videos human actions through a Hierarchical Multi-task MLP Mixer (H3M).
Learning State-Aware Visual Representations from Audible Interactions
However, learning representations from videos can be challenging.
Rethinking Learning Approaches for Long-Term Action Anticipation
Action anticipation involves predicting future actions having observed the initial portion of a video.
HierVL: Learning Hierarchical Video-Language Embeddings
Video-language embeddings are a promising avenue for injecting semantics into visual representations, but existing methods capture only short-term associations between seconds-long video clips and their accompanying text.
Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023
We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models.
Object-centric Video Representation for Long-term Action Anticipation
To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.