no code implementations • CVPR 2021 • Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.
1 code implementation • ECCV 2020 • Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva
This allows our model to perform cognitive tasks such as set abstraction (which general concept is in common among a set of videos?
2 code implementations • 1 Nov 2019 • Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds.
no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou
The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.
no code implementations • 28 May 2019 • Mathew Monfort, Kandan Ramakrishnan, Barry A McNamara, Alex Lascelles, Dan Gutfreund, Rogerio Feris, Aude Oliva
A number of recent methods to understand neural networks have focused on quantifying the role of individual features.
1 code implementation • CVPR 2019 • Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu
Specifically, the model encodes multiple agents' past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context.
4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.
114 code implementations • 25 Apr 2016 • Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba
The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal.
no code implementations • NeurIPS 2015 • Mathew Monfort, Brenden M. Lake, Brian Ziebart, Patrick Lucey, Josh Tenenbaum
Recent machine learning methods for sequential behavior prediction estimate the motives of behavior rather than the behavior itself.