Search Results for author: Philipp Krähenbühl

Found 45 papers, 38 papers with code

Distilling Vision-Language Models on Millions of Videos

no code implementations • 11 Jan 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Language Modelling Retrieval +2

Paper
Add Code

Language-conditioned Detection Transformer

1 code implementation • 29 Nov 2023 • Jang Hyun Cho, Philipp Krähenbühl

We use this detector to pseudo-label images with image-level labels.

Pseudo Label

Paper
Code

Predicting a Protein's Stability under a Million Mutations

1 code implementation • NeurIPS 2023 • Jeffrey Ouyang-Zhang, Daniel J. Diaz, Adam R. Klivans, Philipp Krähenbühl

We build Mutate Everything on top of ESM2 and AlphaFold, neither of which were trained to predict thermodynamic stability.

Paper
Code

Training a Large Video Model on a Single Machine in a Day

1 code implementation • 28 Sep 2023 • Yue Zhao, Philipp Krähenbühl

Videos are big, complex to pre-process, and slow to train on.

Ranked #1 on Multi-Instance Retrieval on EPIC-KITCHENS-100

Action Recognition Multi-Instance Retrieval

Paper
Code

Long-tail Detection with Effective Class-Margins

1 code implementation • 23 Jan 2023 • Jang Hyun Cho, Philipp Krähenbühl

Large-scale object detection and instance segmentation face a severe data imbalance.

Binary Classification Instance Segmentation +5

Paper
Code

PartDistillation: Learning Parts From Instance Segmentation

1 code implementation • CVPR 2023 • Jang Hyun Cho, Philipp Krähenbühl, Vignesh Ramanathan

PartDistillation transfers the part information of an instance segmentation model into a part segmentation model through self-supervised self-training on a large dataset.

Instance Segmentation Object +3

Paper
Code

NMS Strikes Back

1 code implementation • 12 Dec 2022 • Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl

Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50. 2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting.

Ranked #2 on Object Detection on COCO-O (using extra training data)

Attribute object-detection +1

234

Paper
Code

Learning Video Representations from Large Language Models

2 code implementations • CVPR 2023 • Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar

We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).

Ranked #1 on Action Recognition on Charades-Ego

Action Classification Action Recognition +2

437

Paper
Code

Real-time Online Video Detection with Temporal Smoothing Transformers

1 code implementation • 19 Sep 2022 • Yue Zhao, Philipp Krähenbühl

Streaming video recognition reasons about objects and their actions in every frame of a video.

Action Anticipation Online Action Detection +1

Paper
Code

Cross-view Transformers for real-time Map-view Semantic Segmentation

2 code implementations • CVPR 2022 • Brady Zhou, Philipp Krähenbühl

The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation.

Ranked #8 on Bird's-Eye View Semantic Segmentation on nuScenes

Bird's-Eye View Semantic Segmentation Segmentation

505

Paper
Code

Global Tracking Transformers

1 code implementation • CVPR 2022 • Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl

The transformer encodes object features from all frames, and uses trajectory queries to group them into trajectories.

Ranked #13 on Multi-Object Tracking on SportsMOT (using extra training data)

Multi-Object Tracking Object

366

Paper
Code

Learning from All Vehicles

1 code implementation • CVPR 2022 • Dian Chen, Philipp Krähenbühl

In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes.

Ranked #5 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving CARLA longest6

381

Paper
Code

Detecting Twenty-thousand Classes using Image-level Supervision

1 code implementation • 7 Jan 2022 • Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning.

Ranked #2 on Open Vocabulary Object Detection on OpenImages-v4

Image Classification Open Vocabulary Object Detection

1,774

Paper
Code

Multimodal Virtual Point 3D Detection

1 code implementation • NeurIPS 2021 • Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl

For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two.

Ranked #63 on 3D Object Detection on nuScenes

3D Object Detection Autonomous Driving

257

Paper
Code

Towards Long-Form Video Understanding

2 code implementations • CVPR 2021 • Chao-yuan Wu, Philipp Krähenbühl

Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.

Ranked #26 on Action Recognition on AVA v2.2

Action Recognition Video Recognition +1

Paper
Code

Learning to drive from a world on rails

1 code implementation • ICCV 2021 • Dian Chen, Vladlen Koltun, Philipp Krähenbühl

This assumption greatly simplifies the learning problem, factorizing the dynamics into a nonreactive world model and a low-dimensional and compact forward model of the ego-vehicle.

Ranked #12 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving CARLA longest6 +1

159

Paper
Code

Probabilistic two-stage detection

2 code implementations • 12 Mar 2021 • Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

We develop a probabilistic interpretation of two-stage object detection.

Ranked #20 on Object Detection on COCO-O

object-detection Object Detection +2

1,190

Paper
Code

Simple multi-dataset detection

1 code implementation • CVPR 2022 • Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

Experiments show our learned taxonomy outperforms a expert-designed taxonomy in all datasets.

Instance Segmentation object-detection +2

480

Paper
Code

Memory Optimization for Deep Networks

1 code implementation • ICLR 2021 • Aashaka Shah, Chao-yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl

Deep learning is slowly, but steadily, hitting a memory bottleneck.

172

Paper
Code

Domain Adaptation Through Task Distillation

1 code implementation • 27 Aug 2020 • Brady Zhou, Nimit Kalra, Philipp Krähenbühl

We use these recognition datasets to link up a source and target domain to transfer models between them in a task distillation framework.

Autonomous Driving Domain Adaptation

Paper
Code

Center-based 3D Object Detection and Tracking

11 code implementations • CVPR 2021 • Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl

Three-dimensional objects are commonly represented as 3D boxes in a point-cloud.

Ranked #1 on Robust 3D Object Detection on nuScenes-C

3D Multi-Object Tracking 3D Object Tracking +4

4,828

Paper
Code

Lossless Image Compression through Super-Resolution

1 code implementation • 6 Apr 2020 • Sheng Cao, Chao-yuan Wu, Philipp Krähenbühl

We introduce a simple and efficient lossless image compression algorithm.

Image Compression Super-Resolution

1,038

Paper
Code

Tracking Objects as Points

7 code implementations • ECCV 2020 • Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection.

Ranked #4 on Multiple Object Tracking on KITTI Tracking test

Multi-Object Tracking Multiple Object Tracking +2

12,088

Paper
Code

Learning by Cheating

9 code implementations • 27 Dec 2019 • Dian Chen, Brady Zhou, Vladlen Koltun, Philipp Krähenbühl

We first train an agent that has access to privileged information.

Ranked #16 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving

290

Paper
Code

A Multigrid Method for Efficiently Training Video Models

3 code implementations • CVPR 2020 • Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Ranked #1 on Video Classification on Kinetics

Action Detection Action Recognition +2

6,282

Paper
Code

Does computer vision matter for action?

no code implementations • 30 May 2019 • Brady Zhou, Philipp Krähenbühl, Vladlen Koltun

Thus the central question of our work: Does computer vision matter for action?

Paper
Add Code

Monocular Plan View Networks for Autonomous Driving

no code implementations • 16 May 2019 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell

Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Don't let your Discriminator be fooled

no code implementations • ICLR 2019 • Brady Zhou, Philipp Krähenbühl

We experimentally show that any GAN objective, including Wasserstein GANs, benefit from adversarial robustness both quantitatively and qualitatively.

Adversarial Robustness

Paper
Add Code

Objects as Points

77 code implementations • 16 Apr 2019 • Xingyi Zhou, Dequan Wang, Philipp Krähenbühl

We model an object as a single point --- the center point of its bounding box.

Ranked #4 on One-stage Anchor-free Oriented Object Detection on SKU110K-R

Keypoint Detection Keypoint Estimation +3

76,617

Paper
Code

Bottom-up Object Detection by Grouping Extreme and Center Points

2 code implementations • CVPR 2019 • Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl

With the advent of deep learning, object detection drifted from a bottom-up to a top-down recognition problem.

Ranked #128 on Object Detection on COCO minival

Keypoint Estimation Object +2

1,031

Paper
Code

Long-Term Feature Banks for Detailed Video Understanding

4 code implementations • CVPR 2019 • Chao-yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick

To understand the world, we humans constantly need to relate the present to the past, and put events in context.

Ranked #4 on Egocentric Activity Recognition on EPIC-KITCHENS-55

Action Classification Action Recognition +2

3,908

Paper
Code

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

Ranked #12 on Multiple Object Tracking on KITTI Tracking test

3D Object Detection 3D Pose Estimation +4

653

Paper
Code

Assessing Generalization in Deep Reinforcement Learning

1 code implementation • ICLR 2019 • Charles Packer, Katelyn Gao, Jernej Kos, Philipp Krähenbühl, Vladlen Koltun, Dawn Song

Our aim is to catalyze community-wide progress on generalization in deep RL.

Out-of-Distribution Generalization reinforcement-learning +1

Paper
Code

Video Compression through Image Interpolation

1 code implementation • ECCV 2018 • Chao-yuan Wu, Nayan Singhal, Philipp Krähenbühl

An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos.

Video Compression

209

Paper
Code

Compressed Video Action Recognition

1 code implementation • CVPR 2018 • Chao-yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl

), we propose to train a deep network directly on the compressed video.

Ranked #46 on Action Classification on Charades (using extra training data)

Action Classification Action Recognition +2

494

Paper
Code

Sampling Matters in Deep Embedding Learning

6 code implementations • ICCV 2017 • Chao-yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl

In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions.

Ranked #5 on Image Retrieval on CARS196

Clustering Face Verification +4

262

Paper
Code

Generative Visual Manipulation on the Natural Image Manifold

1 code implementation • 12 Sep 2016 • Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros

Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result.

Image Manipulation

3,956

Paper
Code

Adversarial Feature Learning

10 code implementations • 31 May 2016 • Jeff Donahue, Philipp Krähenbühl, Trevor Darrell

The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution.

9,141

Paper
Code

Learning Dense Correspondence via 3D-guided Cycle Consistency

no code implementations • CVPR 2016 • Tinghui Zhou, Philipp Krähenbühl, Mathieu Aubry, Qi-Xing Huang, Alexei A. Efros

We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth.

Paper
Add Code

Constrained Structured Regression with Convolutional Neural Networks

no code implementations • 23 Nov 2015 • Deepak Pathak, Philipp Krähenbühl, Stella X. Yu, Trevor Darrell

We present a regression framework which models the output distribution of neural networks.

Intrinsic Image Decomposition regression

Paper
Add Code

Data-dependent Initializations of Convolutional Neural Networks

2 code implementations • 21 Nov 2015 • Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell

Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable.

Image Classification object-detection +2

138

Paper
Code

Learning Data-driven Reflectance Priors for Intrinsic Image Decomposition

no code implementations • ICCV 2015 • Tinghui Zhou, Philipp Krähenbühl, Alexei A. Efros

We propose a data-driven approach for intrinsic image decomposition, which is the process of inferring the confounding factors of reflectance and shading in an image.

Image Relighting Intrinsic Image Decomposition

Paper
Add Code

Learning a Discriminative Model for the Perception of Realism in Composite Images

1 code implementation • ICCV 2015 • Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros

What makes an image appear realistic?

Paper
Code

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

1 code implementation • ICCV 2015 • Deepak Pathak, Philipp Krähenbühl, Trevor Darrell

We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i. e. predicted label distribution) of a CNN.

Image Segmentation Semantic Segmentation +2

Paper
Code

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

3 code implementations • 20 Oct 2012 • Philipp Krähenbühl, Vladlen Koltun

In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image.

Image Segmentation Segmentation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.