no code implementations • 11 Jan 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.
1 code implementation • 29 Nov 2023 • Jang Hyun Cho, Philipp Krähenbühl
We use this detector to pseudo-label images with image-level labels.
1 code implementation • NeurIPS 2023 • Jeffrey Ouyang-Zhang, Daniel J. Diaz, Adam R. Klivans, Philipp Krähenbühl
We build Mutate Everything on top of ESM2 and AlphaFold, neither of which were trained to predict thermodynamic stability.
1 code implementation • 28 Sep 2023 • Yue Zhao, Philipp Krähenbühl
Videos are big, complex to pre-process, and slow to train on.
Ranked #1 on Multi-Instance Retrieval on EPIC-KITCHENS-100
1 code implementation • 23 Jan 2023 • Jang Hyun Cho, Philipp Krähenbühl
Large-scale object detection and instance segmentation face a severe data imbalance.
1 code implementation • CVPR 2023 • Jang Hyun Cho, Philipp Krähenbühl, Vignesh Ramanathan
PartDistillation transfers the part information of an instance segmentation model into a part segmentation model through self-supervised self-training on a large dataset.
1 code implementation • 12 Dec 2022 • Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl
Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50. 2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting.
Ranked #2 on Object Detection on COCO-O (using extra training data)
2 code implementations • CVPR 2023 • Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar
We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).
Ranked #1 on Action Recognition on Charades-Ego
1 code implementation • 19 Sep 2022 • Yue Zhao, Philipp Krähenbühl
Streaming video recognition reasons about objects and their actions in every frame of a video.
2 code implementations • CVPR 2022 • Brady Zhou, Philipp Krähenbühl
The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation.
Ranked #8 on Bird's-Eye View Semantic Segmentation on nuScenes
1 code implementation • CVPR 2022 • Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl
The transformer encodes object features from all frames, and uses trajectory queries to group them into trajectories.
Ranked #13 on Multi-Object Tracking on SportsMOT (using extra training data)
1 code implementation • CVPR 2022 • Dian Chen, Philipp Krähenbühl
In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes.
Ranked #5 on Autonomous Driving on CARLA Leaderboard
1 code implementation • 7 Jan 2022 • Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra
For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning.
Ranked #2 on Open Vocabulary Object Detection on OpenImages-v4
1 code implementation • NeurIPS 2021 • Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl
For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two.
Ranked #63 on 3D Object Detection on nuScenes
2 code implementations • CVPR 2021 • Chao-yuan Wu, Philipp Krähenbühl
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.
Ranked #26 on Action Recognition on AVA v2.2
1 code implementation • ICCV 2021 • Dian Chen, Vladlen Koltun, Philipp Krähenbühl
This assumption greatly simplifies the learning problem, factorizing the dynamics into a nonreactive world model and a low-dimensional and compact forward model of the ego-vehicle.
Ranked #12 on Autonomous Driving on CARLA Leaderboard
2 code implementations • 12 Mar 2021 • Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl
We develop a probabilistic interpretation of two-stage object detection.
Ranked #20 on Object Detection on COCO-O
1 code implementation • CVPR 2022 • Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl
Experiments show our learned taxonomy outperforms a expert-designed taxonomy in all datasets.
1 code implementation • ICLR 2021 • Aashaka Shah, Chao-yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl
Deep learning is slowly, but steadily, hitting a memory bottleneck.
1 code implementation • 27 Aug 2020 • Brady Zhou, Nimit Kalra, Philipp Krähenbühl
We use these recognition datasets to link up a source and target domain to transfer models between them in a task distillation framework.
11 code implementations • CVPR 2021 • Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud.
Ranked #1 on Robust 3D Object Detection on nuScenes-C
1 code implementation • 6 Apr 2020 • Sheng Cao, Chao-yuan Wu, Philipp Krähenbühl
We introduce a simple and efficient lossless image compression algorithm.
7 code implementations • ECCV 2020 • Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl
Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection.
Ranked #4 on Multiple Object Tracking on KITTI Tracking test
9 code implementations • 27 Dec 2019 • Dian Chen, Brady Zhou, Vladlen Koltun, Philipp Krähenbühl
We first train an agent that has access to privileged information.
Ranked #16 on Autonomous Driving on CARLA Leaderboard
3 code implementations • CVPR 2020 • Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
Ranked #1 on Video Classification on Kinetics
no code implementations • 30 May 2019 • Brady Zhou, Philipp Krähenbühl, Vladlen Koltun
Thus the central question of our work: Does computer vision matter for action?
no code implementations • 16 May 2019 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell
Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.
no code implementations • ICLR 2019 • Brady Zhou, Philipp Krähenbühl
We experimentally show that any GAN objective, including Wasserstein GANs, benefit from adversarial robustness both quantitatively and qualitatively.
77 code implementations • 16 Apr 2019 • Xingyi Zhou, Dequan Wang, Philipp Krähenbühl
We model an object as a single point --- the center point of its bounding box.
2 code implementations • CVPR 2019 • Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl
With the advent of deep learning, object detection drifted from a bottom-up to a top-down recognition problem.
Ranked #128 on Object Detection on COCO minival
4 code implementations • CVPR 2019 • Chao-yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #4 on Egocentric Activity Recognition on EPIC-KITCHENS-55
1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.
Ranked #12 on Multiple Object Tracking on KITTI Tracking test
1 code implementation • ICLR 2019 • Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, Dawn Song
Our aim is to catalyze community-wide progress on generalization in deep RL.
Out-of-Distribution Generalization reinforcement-learning +1
1 code implementation • ECCV 2018 • Chao-yuan Wu, Nayan Singhal, Philipp Krähenbühl
An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos.
1 code implementation • CVPR 2018 • Chao-yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
), we propose to train a deep network directly on the compressed video.
Ranked #46 on Action Classification on Charades (using extra training data)
6 code implementations • ICCV 2017 • Chao-yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions.
Ranked #5 on Image Retrieval on CARS196
1 code implementation • 12 Sep 2016 • Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result.
10 code implementations • 31 May 2016 • Jeff Donahue, Philipp Krähenbühl, Trevor Darrell
The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution.
no code implementations • CVPR 2016 • Tinghui Zhou, Philipp Krähenbühl, Mathieu Aubry, Qi-Xing Huang, Alexei A. Efros
We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth.
no code implementations • 23 Nov 2015 • Deepak Pathak, Philipp Krähenbühl, Stella X. Yu, Trevor Darrell
We present a regression framework which models the output distribution of neural networks.
2 code implementations • 21 Nov 2015 • Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell
Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable.
no code implementations • ICCV 2015 • Tinghui Zhou, Philipp Krähenbühl, Alexei A. Efros
We propose a data-driven approach for intrinsic image decomposition, which is the process of inferring the confounding factors of reflectance and shading in an image.
1 code implementation • ICCV 2015 • Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
What makes an image appear realistic?
1 code implementation • ICCV 2015 • Deepak Pathak, Philipp Krähenbühl, Trevor Darrell
We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i. e. predicted label distribution) of a CNN.
3 code implementations • 20 Oct 2012 • Philipp Krähenbühl, Vladlen Koltun
In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image.