18 code implementations • ICCV 2023 • Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.
Ranked #2 on Zero-Shot Instance Segmentation on LVIS v1.0 val
1 code implementation • ICCV 2023 • Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra
While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.
Ranked #1 on Few-Shot Image Classification on ImageNet - 10-shot (using extra training data)
2 code implementations • CVPR 2022 • Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten
Model pre-training is a cornerstone of modern visual recognition systems.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)
Fine-Grained Image Classification Out-of-Distribution Generalization +3
49 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
1 code implementation • NeurIPS 2021 • Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick
To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.
2 code implementations • CVPR 2021 • Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov
We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects.
4 code implementations • CVPR 2021 • Piotr Dollár, Mannat Singh, Ross Girshick
This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling model width, while scaling depth and resolution to a lesser extent.
2 code implementations • 1 Feb 2021 • Achal Dave, Piotr Dollár, Deva Ramanan, Alexander Kirillov, Ross Girshick
On one hand, this is desirable as it treats all classes equally.
24 code implementations • CVPR 2020 • Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár
In this work, we present a new network design paradigm.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
2 code implementations • ECCV 2020 • Chenxi Liu, Piotr Dollár, Kaiming He, Ross Girshick, Alan Yuille, Saining Xie
Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels.
3 code implementations • CVPR 2019 • Agrim Gupta, Piotr Dollár, Ross Girshick
We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images.
4 code implementations • ICCV 2019 • Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár
Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape.
2 code implementations • ICCV 2019 • Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár
To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.
Ranked #90 on Instance Segmentation on COCO test-dev
12 code implementations • CVPR 2019 • Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár
In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks.
Ranked #4 on Panoptic Segmentation on Indian Driving Dataset
1 code implementation • ICCV 2019 • Kaiming He, Ross Girshick, Piotr Dollár
We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization.
Ranked #81 on Object Detection on COCO minival
9 code implementations • CVPR 2019 • Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár
We propose and study a task we name panoptic segmentation (PS).
Ranked #23 on Panoptic Segmentation on Cityscapes val (using extra training data)
4 code implementations • CVPR 2018 • Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.
3 code implementations • CVPR 2018 • Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick
Most methods for object instance segmentation require all training examples to be labeled with segmentation masks.
230 code implementations • ICCV 2017 • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Ranked #3 on Region Proposal on COCO test-dev
71 code implementations • 8 Jun 2017 • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.
2 code implementations • CVPR 2018 • Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
Ranked #53 on Human-Object Interaction Detection on HICO-DET
172 code implementations • ICCV 2017 • Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
Ranked #1 on Keypoint Estimation on GRIT
1 code implementation • CVPR 2017 • Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath Hariharan
Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature.
85 code implementations • CVPR 2017 • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
Feature pyramids are a basic component in recognition systems for detecting objects at different scales.
Ranked #3 on Pedestrian Detection on TJU-Ped-campus
58 code implementations • CVPR 2017 • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set.
Ranked #3 on Image Classification on GasHisSDB
1 code implementation • 7 Apr 2016 • Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár
To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple network layers, (2) a foveal structure to exploit object context at multiple object resolutions, and (3) an integral loss function and corresponding network adjustment that improve localization.
Ranked #104 on Instance Segmentation on COCO test-dev
no code implementations • CVPR 2016 • Yin Li, Manohar Paluri, James M. Rehg, Piotr Dollár
In this work we present a simple yet effective approach for training edge detectors without human supervision.
2 code implementations • CVPR 2017 • Yan Zhu, Yuandong Tian, Dimitris Mexatas, Piotr Dollár
Specifically, we create an amodal segmentation of each image: the full extent of each region is marked, not just the visible pixels.
no code implementations • 17 Feb 2015 • Jan Hosang, Rodrigo Benenson, Piotr Dollár, Bernt Schiele
Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images.
1 code implementation • CVPR 2015 • Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig
The language model learns from a set of over 400, 000 image descriptions to capture the statistics of word usage.
Ranked #1 on Image Captioning on COCO Captions test
no code implementations • 20 Jun 2014 • Piotr Dollár, C. Lawrence Zitnick
We formulate the problem of predicting local edge masks in a structured learning framework applied to random decision forests.
no code implementations • 4 Jun 2014 • Woonhyun Nam, Piotr Dollár, Joon Hee Han
In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost.
35 code implementations • 1 May 2014 • Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.