Search Results for author: Fahad Shahbaz Khan

Found 181 papers, 135 papers with code

Fixing Localization Errors to Improve Image Classification

1 code implementation • ECCV 2020 • Guolei Sun, Salman Khan, Wen Li, Hisham Cholakkal, Fahad Shahbaz Khan, Luc van Gool

This way, in an effort to fix localization errors, our loss provides an extra supervisory signal that helps the model to better discriminate between similar classes.

Classification General Classification +3

Paper
Code

Count- and Similarity-aware R-CNN for Pedestrian Detection

no code implementations • ECCV 2020 • Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah

We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.

Human Instance Segmentation Pedestrian Detection +1

Paper
Add Code

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

no code implementations • 6 May 2024 • Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks.

Autonomous Vehicles Video Understanding

Paper
Add Code

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

no code implementations • 23 Apr 2024 • Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL.

Zero-Shot Learning

Paper
Add Code

Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels

1 code implementation • 15 Apr 2024 • Amaya Dharmasiri, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Thereby we demonstrate that 2D vision language models such as CLIP can be used to complement 3D representation learning to improve classification performance without the need for expensive class annotations.

Representation Learning

Paper
Code

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

no code implementations • 11 Apr 2024 • Shiming Chen, Wenjin Hou, Salman Khan, Fahad Shahbaz Khan

ZSLViT mainly considers two properties in the whole network: i) discover the semantic-related visual representations explicitly, and ii) discard the semantic-unrelated visual information.

Zero-Shot Learning

Paper
Add Code

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

1 code implementation • 2 Apr 2024 • Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.

Decoder Image Denoising +2

Paper
Code

Language Guided Domain Generalized Medical Image Segmentation

1 code implementation • 1 Apr 2024 • Shahina Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Incorporating text features alongside visual features is a potential solution to enhance the model's understanding of the data, as it goes beyond pixel-level information to provide valuable context.

Contrastive Learning Domain Generalization +4

Paper
Code

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

1 code implementation • 26 Mar 2024 • Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.

Object Segmentation +3

Paper
Code

ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection

1 code implementation • 26 Mar 2024 • Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khan

Deep learning has shown remarkable success in remote sensing change detection (CD), aiming to identify semantic change regions between co-registered satellite image pairs acquired at distinct time stamps.

Change Detection

Paper
Code

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

1 code implementation • 25 Mar 2024 • Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.

Composed Video Retrieval (CoVR) Retrieval

Paper
Code

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation • 21 Mar 2024 • Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Paper
Code

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

no code implementations • 21 Mar 2024 • Ahmad Mahmood, Ashmal Vayani, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs.

Pose Estimation Video Understanding +1

Paper
Add Code

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

1 code implementation • 21 Mar 2024 • Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan

Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task.

Deblurring Denoising +3

Paper
Code

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

1 code implementation • 8 Mar 2024 • Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan

Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data.

Multi-Label Classification

Paper
Code

ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes

1 code implementation • 7 Mar 2024 • Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

We produce various versions of standard vision datasets (ImageNet, COCO), incorporating either diverse and realistic backgrounds into the images or introducing color, texture, and adversarial changes in the background.

Object

Paper
Code

Effectiveness Assessment of Recent Large Vision-Language Models

no code implementations • 7 Mar 2024 • Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-Ping Fan, Fahad Shahbaz Khan

This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of these novel models.

Anomaly Detection Attribute +7

Paper
Add Code

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

1 code implementation • 26 Feb 2024 • Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.

534

Paper
Code

Semi-supervised Open-World Object Detection

1 code implementation • 25 Feb 2024 • Sahal Shaji Mullappilly, Abhishek Singh Gehlot, Rao Muhammad Anwer, Fahad Shahbaz Khan, Hisham Cholakkal

We demonstrate the effectiveness of our SS-OWOD problem setting and approach for remote sensing object detection, proposing carefully curated splits and baseline performance evaluations.

Incremental Learning Object +2

Paper
Code

BiMediX: Bilingual Medical Mixture of Experts LLM

1 code implementation • 20 Feb 2024 • Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic.

Multiple-choice Open-Ended Question Answering

Paper
Code

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

1 code implementation • 8 Feb 2024 • Senmao Li, Joost Van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang

However, these models struggle to effectively suppress the generation of undesired content, which is explicitly requested to be omitted from the generated image in the prompt.

Paper
Code

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

no code implementations • 31 Dec 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

Paper
Add Code

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

1 code implementation • 15 Dec 2023 • Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang

This finding inspired us to omit the encoder at certain adjacent time-steps and reuse cyclically the encoder features in the previous time-steps for the decoder.

Decoder Knowledge Distillation

258

Paper
Code

Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM

1 code implementation • 14 Dec 2023 • Sahal Shaji Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

To this end, we propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability.

Paper
Code

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

1 code implementation • 27 Nov 2023 • Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection.

Ranked #3 on Open Vocabulary Semantic Segmentation on PASCAL Context-459

Decoder Open Vocabulary Semantic Segmentation +2

Paper
Code

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

1 code implementation • 24 Nov 2023 • Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan

Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries.

Instruction Following Language Modelling +3

293

Paper
Code

Enhancing Novel Object Detection via Cooperative Foundational Models

1 code implementation • 19 Nov 2023 • Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

We present a novel approach to transform existing closed-set detectors into open-set detectors.

Ranked #1 on Novel Object Detection on LVIS v1.0 val

Novel Class Discovery Novel Object Detection +3

Paper
Code

Cal-DETR: Calibrated Detection Transformer

1 code implementation • NeurIPS 2023 • Muhammad Akhtar Munir, Salman Khan, Muhammad Haris Khan, Mohsen Ali, Fahad Shahbaz Khan

Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance.

Decision Making

Paper
Code

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

no code implementations • NeurIPS 2023 • Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak, Muzammal Naseer, Fahad Shahbaz Khan, Salman Khan

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks.

Domain Generalization Zero-shot Generalization

Paper
Add Code

Videoprompter: an ensemble of foundational models for zero-shot video understanding

no code implementations • 23 Oct 2023 • Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Consistent improvements across multiple benchmarks and with various VLMs demonstrate the effectiveness of our proposed framework.

Ranked #2 on Video-Text Retrieval on Test-of-Time

Action Recognition Descriptive +3

Paper
Add Code

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

1 code implementation • ICCV 2023 • Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan

We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively.

Image Segmentation Segmentation +3

Paper
Code

Unsupervised Landmark Discovery Using Consistency Guided Bottleneck

1 code implementation • 19 Sep 2023 • Mamona Awan, Muhammad Haris Khan, Sanoojan Baliah, Muhammad Ahmad Waseem, Salman Khan, Fahad Shahbaz Khan, Arif Mahmood

In the current work, we introduce a consistency-guided bottleneck in an image reconstruction-based pipeline that leverages landmark consistency, a measure of compatibility score with the pseudo-ground truth to generate adaptive heatmaps.

Image Reconstruction

Paper
Code

A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

1 code implementation • 9 Sep 2023 • Chao Qin, Jiale Cao, Huazhu Fu, Rao Muhammad Anwer, Fahad Shahbaz Khan

Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation.

Decoder Lesion Detection

Paper
Code

Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement

1 code implementation • 30 Aug 2023 • Basit Alawode, Fayaz Ali Dharejo, Mehnaz Ummar, Yuhang Guo, Arif Mahmood, Naoufel Werghi, Fahad Shahbaz Khan, Jiri Matas, Sajid Javed

The method has resulted in a significant performance improvement, of up to 5. 0% AUC, of state-of-the-art (SOTA) visual trackers.

Image Enhancement Visual Object Tracking +1

Paper
Code

How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

1 code implementation • 27 Jul 2023 • Haotong Qin, Ge-Peng Ji, Salman Khan, Deng-Ping Fan, Fahad Shahbaz Khan, Luc van Gool

Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI.

Paper
Code

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

419

Paper
Code

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

1 code implementation • 14 Jul 2023 • Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.

Adversarial Attack Image Segmentation +3

Paper
Code

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Ranked #2 on Prompt Engineering on ImageNet V2

Prompt Engineering

189

Paper
Code

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

2 code implementations • ICCV 2023 • Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

Video transformer designs are based on self-attention that can model global context at a high computational cost.

Ranked #1 on Action Recognition on Diving-48

Action Recognition Temporal Action Localization +1

Paper
Code

PromptIR: Prompting for All-in-One Blind Image Restoration

1 code implementation • 22 Jun 2023 • Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan

We present a prompt-based learning approach, PromptIR, for All-In-One image restoration that can effectively restore images from various types and levels of degradation.

Image Denoising Image Restoration +1

279

Paper
Code

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

1 code implementation • 21 Jun 2023 • Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah

We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level.

Ranked #13 on Anomaly Detection on CUHK Avenue

Anomaly Detection Decoder +1

Paper
Code

Learnable Weight Initialization for Volumetric Medical Image Segmentation

1 code implementation • 15 Jun 2023 • Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.

Image Segmentation Organ Segmentation +3

Paper
Code

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

1 code implementation • 13 Jun 2023 • Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan

The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks.

Language Modelling Large Language Model

430

Paper
Code

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

1 code implementation • 8 Jun 2023 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to interact with visual data.

Ranked #3 on Question Answering on NExT-QA (Open-ended VideoQA)

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +5

963

Paper
Code

DFormer: Diffusion-guided Transformer for Universal Image Segmentation

1 code implementation • 6 Jun 2023 • Hefeng Wang, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3. 6% on MS COCO val2017 set.

Decoder Denoising +4

Paper
Code

Modulate Your Spectrum in Self-Supervised Learning

1 code implementation • 26 May 2023 • Xi Weng, Yunhao Ni, Tengwei Song, Jie Luo, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan, Lei Huang

In this work, we introduce Spectral Transformation (ST), a framework to modulate the spectrum of embedding and to seek for functions beyond whitening that can avoid dimensional collapse.

object-detection Object Detection +1

Paper
Code

Salient Mask-Guided Vision Transformer for Fine-Grained Classification

1 code implementation • 11 May 2023 • Dmitry Demidov, Muhammad Hamza Sharif, Aliakbar Abdurahimov, Hisham Cholakkal, Fahad Shahbaz Khan

Fine-grained visual classification (FGVC) is a challenging computer vision problem, where the task is to automatically recognise objects from subordinate categories.

Classification Fine-Grained Image Classification

Paper
Code

Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

1 code implementation • CVPR 2023 • Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan

Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules.

Ranked #1 on Co-Salient Object Detection on CoSal2015

Computational Efficiency Co-Salient Object Detection +3

Paper
Code

Remote Sensing Change Detection With Transformers Trained from Scratch

1 code implementation • 13 Apr 2023 • Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.

Change Detection Image Classification

Paper
Code

Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement

1 code implementation • CVPR 2023 • Nancy Mehta, Akshay Dudhane, Subrahmanyam Murala, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan

Burst image processing is becoming increasingly popular in recent years.

Denoising Super-Resolution

Paper
Code

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

1 code implementation • CVPR 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Action Recognition Video Classification +2

Paper
Code

Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification

1 code implementation • 4 Apr 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan

In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues.

Data Augmentation Image Classification +1

Paper
Code

Video Instance Segmentation in an Open-World

1 code implementation • 3 Apr 2023 • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.

Instance Segmentation Semantic Segmentation +1

Paper
Code

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

1 code implementation • ICCV 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.

Computational Efficiency Image Generation

Paper
Code

Burstormer: Burst Image Restoration and Enhancement Transformer

1 code implementation • CVPR 2023 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.

Denoising Image Restoration +1

Paper
Code

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

1 code implementation • 28 Mar 2023 • Senmao Li, Joost Van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.

Ranked #7 on Text-based Image Editing on PIE-Bench

Text-based Image Editing

Paper
Code

3D-Aware Multi-Class Image-to-Image Translation with NeRFs

1 code implementation • CVPR 2023 • Senmao Li, Joost Van de Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang

In the second step, based on the well-trained multi-class 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware I2I translation system.

Image-to-Image Translation Translation

Paper
Code

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

2 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

327

Paper
Code

Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection

1 code implementation • CVPR 2023 • Muhammad Akhtar Munir, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan

Since the original formulation of our loss depends on the counts of true positives and false positives in a minibatch, we develop a differentiable proxy of our loss that can be used during training with other application-specific loss functions.

object-detection Object Detection

Paper
Code

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

1 code implementation • 21 Mar 2023 • Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan

Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.

Decoder Instance Segmentation +1

Paper
Code

3D Instance Segmentation via Enhanced Spatial and Semantic Supervision

no code implementations • ICCV 2023 • Salwa Al Khatib, Mohamed El Amine Boudjoghra, Jean Lahoud, Fahad Shahbaz Khan

Specifically, we provide the transformer block with spatial features to facilitate differentiation between similar object queries and incorporate semantic supervision to enhance prediction accuracy based on object class.

3D Instance Segmentation Decoder +2

Paper
Add Code

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

no code implementations • 30 Dec 2022 • Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan

Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD.

Adversarial Robustness

Paper
Add Code

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.

Image Segmentation Medical Image Segmentation +2

291

Paper
Code

Fine-tuned CLIP Models are Efficient Video Learners

1 code implementation • CVPR 2023 • Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.

219

Paper
Code

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

no code implementations • 28 Nov 2022 • Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models.

Ranked #16 on Anomaly Detection on CUHK Avenue

Anomaly Detection Knowledge Distillation +1

Paper
Add Code

Person Image Synthesis via Denoising Diffusion Model

1 code implementation • CVPR 2023 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.

Denoising Image Generation

461

Paper
Code

An Investigation into Whitening Loss for Self-supervised Learning

1 code implementation • 7 Oct 2022 • Xi Weng, Lei Huang, Lei Zhao, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

A desirable objective in self-supervised learning (SSL) is to avoid feature collapse.

object-detection Object Detection +1

Paper
Code

PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search

1 code implementation • 7 Oct 2022 • Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Fahad Shahbaz Khan

Our PS-ARM achieves state-of-the-art performance on both datasets.

Human Detection Person Search +1

Paper
Code

MaPLe: Multi-modal Prompt Learning

2 code implementations • CVPR 2023 • Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks.

Ranked #2 on Prompt Engineering on ImageNet-A

Prompt Engineering

536

Paper
Code

Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

1 code implementation • 25 Sep 2022 • Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.

Ranked #4 on Anomaly Detection on CUHK Avenue

Event Detection Fault Detection +1

Paper
Code

CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

no code implementations • 13 Sep 2022 • Dhanalaxmi Gaddam, Jean Lahoud, Fahad Shahbaz Khan, Rao Muhammad Anwer, Hisham Cholakkal

In this work, we propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework, which takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene at multiple levels to predict a set of object bounding-boxes along with their corresponding semantic labels.

3D Object Detection Object +2

Paper
Add Code

Transformers in Remote Sensing: A Survey

no code implementations • 2 Sep 2022 • Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, Fahad Shahbaz Khan

Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade.

Paper
Add Code

AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility

1 code implementation • 14 Aug 2022 • Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Akshay Dudhane, Martin Danelljan, Hisham Cholakkal, Salman Khan, Luc van Gool, Fahad Shahbaz Khan

While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects.

Visual Object Tracking Visual Tracking

3,109

Paper
Code

Multi-scale Feature Aggregation for Crowd Counting

no code implementations • 10 Aug 2022 • Xiaoheng Jiang, Xinyi Wu, Hisham Cholakkal, Rao Muhammad Anwer, Jiale Cao Mingliang Xu, Bing Zhou, Yanwei Pang, Fahad Shahbaz Khan

The SkipAgg module directly propagates features with small receptive fields to features with much larger receptive fields.

Crowd Counting

Paper
Add Code

3D Vision with Transformers: A Survey

1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.

Pose Estimation

372

Paper
Code

Self-Distilled Vision Transformer for Domain Generalization

2 code implementations • 25 Jul 2022 • Maryam Sultana, Muzammal Naseer, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan

Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains.

Domain Generalization

Paper
Code

Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations

1 code implementation • 18 Jul 2022 • Hashmat Shadab Malik, Shahina K Kunhimon, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Our training approach is based on a min-max scheme which reduces overfitting via an adversarial objective and thus optimizes for a more generalizable surrogate model.

object-detection Object Detection +2

Paper
Code

SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection

no code implementations • 16 Jul 2022 • Antonio Barbalau, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature.

Ranked #2 on Anomaly Detection on CUHK Avenue

Anomaly Detection Knowledge Distillation +4

Paper
Add Code

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

1 code implementation • 7 Jul 2022 • Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan

Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.

Ranked #1 on Open Vocabulary Object Detection on OpenImages-v4

Object Open Vocabulary Attribute Detection +1

294

Paper
Code

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.

Ranked #1 on Open-World Semi-Supervised Learning on CIFAR-10

Open-World Semi-Supervised Learning

Paper
Code

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

7 code implementations • 21 Jun 2022 • Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan

Our EdgeNeXt model with 1. 3M parameters achieves 71. 2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2% with 28% reduction in FLOPs.

Ranked #29 on Semantic Segmentation on PASCAL VOC 2012 test

Image Classification Object Detection +1

30,048

Paper
Code

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

117

Paper
Code

Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging

no code implementations • 22 Apr 2022 • Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah

We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation.

Object Segmentation +4

Paper
Add Code

Learning Enriched Features for Fast Image Restoration and Enhancement

1 code implementation • 19 Apr 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

In the former case, spatial details are preserved but the contextual information cannot be precisely encoded.

Autonomous Vehicles Deblurring +4

372

Paper
Code

Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes for Medical Image Super-Resolution

1 code implementation • 8 Apr 2022 • Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Andreea-Iuliana Miron, Olivian Savencu, Nicolae-Catalin Ristea, Nicolae Verga, Fahad Shahbaz Khan

Our attention module uses the convolution operation to perform joint spatial-channel attention on multiple concatenated input tensors, where the kernel (receptive field) size controls the reduction rate of the spatial attention, and the number of convolutional filters controls the reduction rate of the channel attention, respectively.

Ranked #1 on Image Super-Resolution on IXI

Computed Tomography (CT) Image Super-Resolution

Paper
Code

PSTR: End-to-End One-Step Person Search With Transformers

1 code implementation • CVPR 2022 • Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan

We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.

Decoder Human Detection +1

Paper
Code

Energy-based Latent Aligner for Incremental Learning

2 code implementations • CVPR 2022 • K J Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Vineeth N Balasubramanian

Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks.

Class Incremental Learning Incremental Learning

Paper
Code

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

1 code implementation • 24 Mar 2022 • Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan

When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.

Instance Segmentation Semantic Segmentation +2

Paper
Code

SepTr: Separable Transformer for Audio Spectrogram Processing

1 code implementation • 17 Mar 2022 • Nicolae-Catalin Ristea, Radu Tudor Ionescu, Fahad Shahbaz Khan

Following the successful application of vision transformers in multiple computer vision tasks, these models have drawn the attention of the signal processing community.

Ranked #1 on Time Series Analysis on Speech Commands

Audio Classification Speech Emotion Recognition +1

Paper
Code

Transformers in Medical Imaging: A Survey

1 code implementation • 24 Jan 2022 • Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, Huazhu Fu

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators.

Image Classification Image Segmentation +6

1,120

Paper
Code

Spatio-temporal Relation Modeling for Few-shot Action Recognition

1 code implementation • CVPR 2022 • Anirudh Thatipelli, Sanath Narayan, Salman Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem

Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101.

Ranked #1 on Few Shot Action Recognition on UCF101 (using extra training data)

Few-Shot action recognition Few Shot Action Recognition +1

Paper
Code

DoodleFormer: Creative Sketch Drawing with Transformers

no code implementations • 6 Dec 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg

Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects.

Decoder Image Generation

Paper
Add Code

Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook

no code implementations • 6 Dec 2021 • Sajid Javed, Martin Danelljan, Fahad Shahbaz Khan, Muhammad Haris Khan, Michael Felsberg, Jiri Matas

Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems.

Visual Object Tracking Visual Tracking

Paper
Add Code

Self-supervised Video Transformer

1 code implementation • CVPR 2022 • Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Ranked #55 on Action Recognition on UCF101

Action Classification Action Recognition In Videos

Paper
Code

OW-DETR: Open-world Detection Transformer

2 code implementations • CVPR 2022 • Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.

Inductive Bias Object +3

221

Paper
Code

Class-agnostic Object Detection with Multi-modal Transformer

1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

This has been a long-standing question in computer vision.

Ranked #1 on Open World Object Detection on COCO 2017 (Outdoor, Accessories, Appliance, Truck)

Class-agnostic Object Detection Object +3

294

Paper
Code

Restormer: Efficient Transformer for High-Resolution Image Restoration

11 code implementations • CVPR 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.

Ranked #1 on Grayscale Image Denoising on Urban100 sigma15

Color Image Denoising Deblurring +7

1,569

Paper
Code

Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection

4 code implementations • CVPR 2022 • Nicolae-Catalin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.

Ranked #1 on Anomaly Detection on CUHK Avenue (TBDC metric)

One-Class Classification

1,684

Paper
Code

UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection

1 code implementation • CVPR 2022 • Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types.

Ranked #5 on Anomaly Detection on CUHK Avenue (using extra training data)

Action Recognition Event Detection +2

Paper
Code

CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation

1 code implementation • 12 Oct 2021 • Nicolae-Catalin Ristea, Andreea-Iuliana Miron, Olivian Savencu, Mariana-Iuliana Georgescu, Nicolae Verga, Fahad Shahbaz Khan, Radu Tudor Ionescu

Our neural model can be trained on unpaired images, due to the integration of a multi-level cycle-consistency loss.

Computed Tomography (CT) Style Transfer +1

Paper
Code

Dense Gaussian Processes for Few-Shot Segmentation

1 code implementation • 7 Oct 2021 • Joakim Johnander, Johan Edstedt, Michael Felsberg, Fahad Shahbaz Khan, Martin Danelljan

Given the support set, our dense GP learns the mapping from local deep image features to mask values, capable of capturing complex appearance distributions.

Ranked #1 on Few-Shot Semantic Segmentation on COCO-20i (10-shot)

Decoder Few-Shot Semantic Segmentation +2

Paper
Code

Burst Image Restoration and Enhancement

1 code implementation • CVPR 2022 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.

Ranked #2 on Burst Image Super-Resolution on BurstSR

Burst Image Super-Resolution Denoising +3

130

Paper
Code

Discriminative Region-based Multi-Label Zero-Shot Learning

1 code implementation • ICCV 2021 • Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah

We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.

Ranked #2 on Multi-label zero-shot learning on Open Images V4

Image Retrieval Multi-label zero-shot learning

Paper
Code

Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains

no code implementations • 15 Jul 2021 • Puneet Mangla, Shivam Chandhok, Vineeth N Balasubramanian, Fahad Shahbaz Khan

Recent progress towards designing models that can generalize to unseen domains (i. e domain generalization) or unseen classes (i. e zero-shot learning) has embarked interest towards building models that can tackle both domain-shift and semantic shift simultaneously (i. e zero-shot domain generalization).

Ranked #1 on Zero-Shot Learning + Domain Generalization on DomainNet

Domain Generalization Zero-Shot Learning +1

Paper
Add Code

Structured Latent Embeddings for Recognizing Unseen Classes in Unseen Domains

no code implementations • 12 Jul 2021 • Shivam Chandhok, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Vineeth N Balasubramanian, Fahad Shahbaz Khan, Ling Shao

The need to address the scarcity of task-specific annotated data has resulted in concerted efforts in recent years for specific settings such as zero-shot learning (ZSL) and domain generalization (DG), to separately address the issues of semantic shift and domain shift, respectively.

Ranked #2 on Zero-Shot Learning + Domain Generalization on DomainNet

Domain Generalization Zero-Shot Learning +1

Paper
Add Code

On Improving Adversarial Transferability of Vision Transformers

3 code implementations • ICLR 2022 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

147

Paper
Code

Intriguing Properties of Vision Transformers

1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

173

Paper
Code

MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains

1 code implementation • 28 Apr 2021 • Yaxing Wang, Abel Gonzalez-Garcia, Chenshen Wu, Luis Herranz, Fahad Shahbaz Khan, Shangling Jui, Joost Van de Weijer

Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs.

Transfer Learning

150

Paper
Code

Rich Semantics Improve Few-shot Learning

no code implementations • 26 Apr 2021 • Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan

Human learning benefits from multi-modal inputs that often appear as rich semantics (e. g., description of an object's attributes while learning about it).

Ranked #1 on Few-Shot Image Classification on Oxford 102 Flower (using extra training data)

Few-Shot Image Classification Few-Shot Learning

Paper
Add Code

Handwriting Transformers

1 code implementation • ICCV 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.

Decoder Image Generation +1

157

Paper
Code

Deep Gaussian Processes for Few-Shot Segmentation

no code implementations • 30 Mar 2021 • Joakim Johnander, Johan Edstedt, Martin Danelljan, Michael Felsberg, Fahad Shahbaz Khan

Through the expressivity of the GP, our approach is capable of modeling complex appearance distributions in the deep feature space.

Decoder Gaussian Processes +1

Paper
Add Code

On Generating Transferable Targeted Perturbations

3 code implementations • ICCV 2021 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a new objective function that not only aligns the global distributions of source and target images, but also matches the local neighbourhood structure between the two domains.

Paper
Code

Orthogonal Projection Loss

1 code implementation • ICCV 2021 • Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

111

Paper
Code

Towards Open World Object Detection

2 code implementations • CVPR 2021 • K J Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian

Humans have a natural instinct to identify unknown object instances in their environments.

Ranked #2 on Open World Object Detection on COCO 2017 (Electronic, Indoor, Kitchen, Furniture)

Clustering Object +2

1,011

Paper
Code

Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning

1 code implementation • CVPR 2021 • Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Equivariance or invariance has been employed standalone in the previous works; however, to the best of our knowledge, they have not been used jointly.

Ranked #8 on Few-Shot Image Classification on FC100 5-way (5-shot)

Few-Shot Image Classification Few-Shot Learning +2

Paper
Code

Multi-Stage Progressive Image Restoration

8 code implementations • CVPR 2021 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.

Ranked #3 on Spectral Reconstruction on ARAD-1K

Deblurring Decoder +4

1,569

Paper
Code

Generative Multi-Label Zero-Shot Learning

1 code implementation • 27 Jan 2021 • Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost Van de Weijer

Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge.

Ranked #8 on Multi-label zero-shot learning on NUS-WIDE

Attribute Generative Adversarial Network +3

Paper
Code

Transformers in Vision: A Survey

no code implementations • 4 Jan 2021 • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +10

Paper
Add Code

Low Light Image Enhancement via Global and Local Context Modeling

no code implementations • 4 Jan 2021 • Aditya Arora, Muhammad Haris, Syed Waqas Zamir, Munawar Hayat, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang

These contexts can be crucial towards inferring several image enhancement tasks, e. g., local and global contrast, brightness and color corrections; which requires cues from both local and global spatial extent.

Low-Light Image Enhancement

Paper
Add Code

D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

1 code implementation • ICCV 2021 • Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.

Ranked #3 on Weakly Supervised Action Localization on THUMOS’14

Denoising Weakly Supervised Action Localization +2

Paper
Code

Learning to Fuse Asymmetric Feature Maps in Siamese Trackers

1 code implementation • CVPR 2021 • Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, Jianbing Shen

We propose a learnable module, called the asymmetric convolution (ACM), which learns to better capture the semantic correlation information in offline training on large-scale data.

Ranked #22 on Visual Object Tracking on TrackingNet

Visual Object Tracking Visual Tracking

Paper
Code

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

1 code implementation • CVPR 2021 • Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.

Ranked #2 on Anomaly Detection on UCSD Peds2

Abnormal Event Detection In Video Anomaly Detection In Surveillance Videos +4

Paper
Code

Synthesizing the Unseen for Zero-shot Object Detection

2 code implementations • 19 Oct 2020 • Nasir Hayat, Munawar Hayat, Shafin Rahman, Salman Khan, Syed Waqas Zamir, Fahad Shahbaz Khan

The existing zero-shot detection approaches project visual features to the semantic domain for seen objects, hoping to map unseen objects to their corresponding semantics during inference.

Ranked #1 on Zero-Shot Object Detection on ImageNet Detection

Generalized Zero-Shot Object Detection Object +1

Paper
Code

Meta-learning the Learning Trends Shared Across Tasks

no code implementations • 19 Oct 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.

Meta-Learning

Paper
Add Code

From Handcrafted to Deep Features for Pedestrian Detection: A Survey

2 code implementations • 1 Oct 2020 • Jiale Cao, Yanwei Pang, Jin Xie, Fahad Shahbaz Khan, Ling Shao

In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance.

Pedestrian Detection

166

Paper
Code

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

no code implementations • 25 Sep 2020 • Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, WangMeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Tongtong Zhao, Shanshan Zhao, Yoseob Han, Byung-Hoon Kim, JaeHyun Baek, HaoNing Wu, Dejia Xu, Bo Zhou, Wei Guan, Xiaobo Li, Chen Ye, Hao Li, Yukai Shi, Zhijing Yang, Xiaojun Yang, Haoyu Zhong, Xin Li, Xin Jin, Yaojun Wu, Yingxue Pang, Sen Liu, Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Marie-Paule Cani, Wan-Chi Siu, Yuanbo Zhou, Rao Muhammad Umer, Christian Micheloni, Xiaofeng Cong, Rajat Gupta, Keon-Hee Ahn, Jun-Hyuk Kim, Jun-Ho Choi, Jong-Seok Lee, Feras Almasri, Thomas Vandamme, Olivier Debeir

This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020.

Image Manipulation Image Super-Resolution +1

Paper
Add Code

A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

2 code implementations • 27 Aug 2020 • Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events.

Ranked #1 on Anomaly Detection In Surveillance Videos on UCSD Peds2

Abnormal Event Detection In Video Anomaly Detection In Surveillance Videos +2

Paper
Code

Image Colorization: A Survey and Dataset

1 code implementation • 25 Aug 2020 • Saeed Anwar, Muhammad Tahir, Chongyi Li, Ajmal Mian, Fahad Shahbaz Khan, Abdul Wahab Muzaffar

Image colorization is the process of estimating RGB colors for grayscale images or video frames to improve their aesthetic and perceptual quality.

Benchmarking Colorization +1

Paper
Code

Stylized Adversarial Defense

1 code implementation • 29 Jul 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

In contrast to existing adversarial training methods that only use class-boundary information (e. g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model.

Adversarial Defense

Paper
Code

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

1 code implementation • ECCV 2020 • Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao

In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3. 0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp.

Ranked #12 on Real-time Instance Segmentation on MSCOCO

object-detection Object Detection +4

334

Paper
Code

Self-supervised Knowledge Distillation for Few-shot Learning

1 code implementation • 17 Jun 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.

Ranked #12 on Few-Shot Image Classification on FC100 5-way (5-shot)

Few-Shot Image Classification Few-Shot Learning +2

Paper
Code

A Self-supervised Approach for Adversarial Robustness

2 code implementations • CVPR 2020 • Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e. g., for classification, segmentation and object detection.

Adversarial Robustness General Classification +3

Paper
Code

Learning Human-Object Interaction Detection using Interaction Points

1 code implementation • CVPR 2020 • Tiancai Wang, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, Jian Sun

Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them.

Human-Object Interaction Detection Keypoint Detection +2

Paper
Code

Semi-supervised Learning for Few-shot Image-to-Image Translation

1 code implementation • CVPR 2020 • Yaxing Wang, Salman Khan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan

In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training.

Image-to-Image Translation Translation

Paper
Code

iTAML: An Incremental Task-Agnostic Meta-learning Approach

1 code implementation • CVPR 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.

Incremental Learning Meta-Learning

Paper
Code

Incremental Object Detection via Meta-Learning

2 code implementations • 17 Mar 2020 • K J Joseph, Jathushan Rajasegaran, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian

In a real-world setting, object instances from new classes can be continuously encountered by object detectors.

Incremental Learning Knowledge Distillation +5

112

Paper
Code

CycleISP: Real Image Restoration via Improved Data Synthesis

8 code implementations • CVPR 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.

Ranked #10 on Image Denoising on DND (using extra training data)

Image Denoising Image Restoration

1,569

Paper
Code

Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

1 code implementation • ECCV 2020 • Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao

We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification.

Ranked #2 on Generalized Zero-Shot Learning on CUB-200-2011

Action Classification Classification +3

127

Paper
Code

Any-Shot Object Detection

no code implementations • 16 Mar 2020 • Shafin Rahman, Salman Khan, Nick Barnes, Fahad Shahbaz Khan

Any-shot detection offers unique challenges compared to conventional novel object detection such as, a high imbalance between unseen, few-shot and seen object classes, susceptibility to forget base-training while learning novel classes and distinguishing novel classes from the background.

Novel Object Detection Object +2

Paper
Add Code

Learning Enriched Features for Real Image Restoration and Enhancement

12 code implementations • ECCV 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.

Ranked #5 on Spectral Reconstruction on ARAD-1K

Image Denoising Image Enhancement +2

1,569

Paper
Code

Learning Fast and Robust Target Models for Video Object Segmentation

2 code implementations • CVPR 2020 • Andreas Robinson, Felix Järemo Lawin, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation.

Ranked #20 on Semi-Supervised Video Object Segmentation on DAVIS (no YouTube-VOS training)

One-shot visual object segmentation Segmentation +2

123

Paper
Code

PSC-Net: Learning Part Spatial Co-occurrence for Occluded Pedestrian Detection

no code implementations • 25 Jan 2020 • Jin Xie, Yanwei Pang, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao

On the heavy occluded (\textbf{HO}) set of CityPerosns test set, our PSC-Net obtains an absolute gain of 4. 0\% in terms of log-average miss rate over the state-of-the-art with same backbone, input scale and without using additional VBB supervision.

Pedestrian Detection

Paper
Add Code

Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes

no code implementations • 14 Dec 2019 • Guolei Sun, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khan, Ling Shao

The main requisite for fine-grained recognition task is to focus on subtle discriminative details that make the subordinate classes different from each other.

Ranked #16 on Fine-Grained Image Classification on Stanford Dogs

Fine-Grained Image Classification

Paper
Add Code

Towards Partial Supervision for Generic Object Counting in Natural Scenes

1 code implementation • 13 Dec 2019 • Hisham Cholakkal, Guolei Sun, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Luc van Gool

Our RLC framework further reduces the annotation cost arising from large numbers of object categories in a dataset by only using lower-count supervision for a subset of categories and class-labels for the remaining ones.

Image Classification Image-level Supervised Instance Segmentation +3

161

Paper
Code

MineGAN: effective knowledge transfer from GANs to target domains with few images

2 code implementations • CVPR 2020 • Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, Joost Van de Weijer

We propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs.

Transfer Learning

712

Paper
Code

Random Path Selection for Continual Learning

1 code implementation • NeurIPS 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman H. Khan, Fahad Shahbaz Khan, Ling Shao

In order to maintain an equilibrium between previous and newly acquired knowledge, we propose a simple controller to dynamically balance the model plasticity.

Ranked #7 on Continual Learning on F-CelebA (10 tasks)

Continual Learning Incremental Learning +1

Paper
Code

Deep Contextual Attention for Human-Object Interaction Detection

no code implementations • ICCV 2019 • Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen

Our approach outperforms the state-of-the-art on all datasets.

Human-Object Interaction Detection Object +3

Paper
Add Code

Mask-Guided Attention Network for Occluded Pedestrian Detection

1 code implementation • ICCV 2019 • Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao

Our approach obtains an absolute gain of 9. 5% in log-average miss rate, compared to the best reported results on the heavily occluded (HO) pedestrian set of CityPersons test set.

Pedestrian Detection

127

Paper
Code

AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

1 code implementation • CVPR 2020 • Muhammad Haris Khan, John McDonagh, Salman Khan, Muhammad Shahabuddin, Aditya Arora, Fahad Shahbaz Khan, Ling Shao, Georgios Tzimiropoulos

Several studies show that animal needs are often expressed through their faces.

Face Alignment Face Detection

Paper
Code

Multi-Modal Fusion for End-to-End RGB-T Tracking

1 code implementation • 30 Aug 2019 • Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan

Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities.

Ranked #10 on Rgb-T Tracking on RGBT210

Image-to-Image Translation Rgb-T Tracking

Paper
Code

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

1 code implementation • ICCV 2019 • Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao

Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.

Ranked #1 on Action Classification on THUMOS'14

Action Classification Weakly Supervised Action Localization +2

Paper
Code

Learning the Model Update for Siamese Trackers

1 code implementation • ICCV 2019 • Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan

In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time.

Visual Tracking

133

Paper
Code

Distilled Siamese Networks for Visual Tracking

no code implementations • 24 Jul 2019 • Jianbing Shen, Yuanpei Liu, Xingping Dong, Xiankai Lu, Fahad Shahbaz Khan, Steven Hoi

This model is intuitively inspired by the one teacher vs. multiple students learning method typically employed in schools.

Knowledge Distillation Object Tracking +1

Paper
Add Code

An Adaptive Random Path Selection Approach for Incremental Learning

1 code implementation • 3 Jun 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang

In a conventional supervised learning setting, a machine learning model has access to examples of all object classes that are desired to be recognized during the inference stage.

Ranked #7 on Incremental Learning on CIFAR-100-B0(5steps of 20 classes)

Incremental Learning Knowledge Distillation +1

Paper
Code

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

3 code implementations • 30 May 2019 • Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, Xiang Bai

Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances.

Ranked #1 on Object Detection on iSAID

Instance Segmentation Object +4

124

Paper
Code

Cross-Domain Transferability of Adversarial Perturbations

2 code implementations • NeurIPS 2019 • Muzammal Naseer, Salman H. Khan, Harris Khan, Fahad Shahbaz Khan, Fatih Porikli

To this end, we propose a framework capable of launching highly transferable attacks that crafts adversarial patterns to mislead networks trained on wholly different domains.

Paper
Code

Discriminative Online Learning for Fast Video Object Segmentation

no code implementations • 18 Apr 2019 • Andreas Robinson, Felix Järemo Lawin, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

We propose a novel approach, based on a dedicated target appearance model that is exclusively learned online to discriminate between the target and background image regions.

Object One-shot visual object segmentation +4

Paper
Add Code

Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition

1 code implementation • CVPR 2019 • Devraj Mandal, Sanath Narayan, Saikumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao

We introduce an out-of-distribution detector that determines whether the video features belong to a seen or unseen action category.

Action Recognition In Videos Out-of-Distribution Detection +2

Paper
Code

Learning Digital Camera Pipeline for Extreme Low-Light Imaging

no code implementations • 11 Apr 2019 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Fahad Shahbaz Khan, Ling Shao

In low-light conditions, a conventional camera imaging pipeline produces sub-optimal images that are usually dark and noisy due to a low photon count and low signal-to-noise ratio (SNR).

Paper
Add Code

Object Counting and Instance Segmentation with Image-level Supervision

2 code implementations • CVPR 2019 • Hisham Cholakkal, Guolei Sun, Fahad Shahbaz Khan, Ling Shao

Moreover, our approach improves state-of-the-art image-level supervised instance segmentation with a relative gain of 17. 8% in terms of average best overlap, on the PASCAL VOC 2012 dataset.

Ranked #1 on Object Counting on COCO count-test

Image-level Supervised Instance Segmentation Object +2

161

Paper
Code

Object-centric Auto-encoders and Dummy Anomalies for Abnormal Event Detection in Video

1 code implementation • CVPR 2019 • Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, Ling Shao

Most existing approaches formulate abnormal event detection as an outlier detection task, due to the scarcity of anomalous data during training.

Ranked #14 on Anomaly Detection on ShanghaiTech

Abnormal Event Detection In Video Binary Classification +4

Paper
Code

A Generative Appearance Model for End-to-end Video Object Segmentation

1 code implementation • CVPR 2019 • Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, Michael Felsberg

One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance.

Ranked #52 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (test-dev)

One-shot visual object segmentation Segmentation +2

Paper
Code

ATOM: Accurate Tracking by Overlap Maximization

3 code implementations • CVPR 2019 • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

We argue that this approach is fundamentally limited since target estimation is a complex task, requiring high-level knowledge about the object.

Ranked #7 on Object Tracking on FE108

General Classification Visual Object Tracking +1

3,109

Paper
Code

Confidence Propagation through CNNs for Guided Sparse Depth Regression

1 code implementation • 5 Nov 2018 • Abdelrahman Eldesokey, Michael Felsberg, Fahad Shahbaz Khan

In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work.

Ranked #7 on Depth Completion on KITTI Depth Completion

Autonomous Driving Depth Completion +1

Paper
Code

Synthetic data generation for end-to-end thermal infrared tracking

no code implementations • 4 Jun 2018 • Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan

These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking.

Image-to-Image Translation Synthetic Data Generation +2

Paper
Add Code

Propagating Confidences through CNNs for Sparse Data Regression

1 code implementation • 30 May 2018 • Abdelrahman Eldesokey, Michael Felsberg, Fahad Shahbaz Khan

To tackle this challenging problem, we introduce an algebraically-constrained convolution layer for CNNs with sparse input and demonstrate its capabilities for the scene depth completion task.

Autonomous Driving Depth Completion +1

Paper
Code

Unveiling the Power of Deep Tracking

no code implementations • ECCV 2018 • Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

In the field of generic object tracking numerous attempts have been made to exploit deep features.

Object Tracking

Paper
Add Code

Density Adaptive Point Set Registration

1 code implementation • CVPR 2018 • Felix Järemo Lawin, Martin Danelljan, Fahad Shahbaz Khan, Per-Erik Forssén, Michael Felsberg

Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes.

Paper
Code

DCCO: Towards Deformable Continuous Convolution Operators

no code implementations • 9 Jun 2017 • Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

Generally, DCF based trackers learn a rigid appearance model of the target.

Paper
Add Code

Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

no code implementations • 5 Jun 2017 • Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost Van de Weijer, Matthieu Molinier, Jorma Laaksonen

To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification.

Ranked #12 on Aerial Scene Classification on AID (20% as trainset)

Aerial Scene Classification General Classification +2

Paper
Add Code

Deep Projective 3D Semantic Segmentation

1 code implementation • 9 May 2017 • Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results.

Ranked #15 on Semantic Segmentation on Semantic3D

Segmentation

389

Paper
Code

Deep Motion Features for Visual Tracking

no code implementations • 20 Dec 2016 • Susanna Gladh, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking.

Action Recognition Optical Flow Estimation +3

Paper
Add Code

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

no code implementations • 14 Dec 2016 • Fahad Shahbaz Khan, Joost Van de Weijer, Rao Muhammad Anwer, Andrew D. Bagdanov, Michael Felsberg, Jorma Laaksonen

Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding.

Action Recognition In Still Images Attribute

Paper
Add Code

ECO: Efficient Convolution Operators for Tracking

5 code implementations • CVPR 2017 • Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65. 0% AUC on OTB-2015.

Ranked #13 on Visual Object Tracking on VOT2017/18

Visual Object Tracking

611

Paper
Code

Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking

no code implementations • CVPR 2016 • Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan, Michael Felsberg

We propose a novel generic approach for alleviating the problem of corrupted training samples in tracking-by-detection frameworks.

Visual Tracking

Paper
Add Code

Discriminative Scale Space Tracking

no code implementations • 20 Sep 2016 • Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan, Michael Felsberg

Compared to the standard exhaustive scale search, our approach achieves a gain of 2. 5% in average overlap precision on the OTB dataset.

Visual Object Tracking

Paper
Add Code

Learning Spatially Regularized Correlation Filters for Visual Tracking

no code implementations • ICCV 2015 • Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan, Michael Felsberg

These methods utilize a periodic assumption of the training samples to efficiently learn a classifier on all patches in the target neighborhood.

Visual Tracking

Paper
Add Code

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

1 code implementation • 12 Aug 2016 • Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, Michael Felsberg

We also demonstrate the effectiveness of our learning formulation in extensive feature point tracking experiments.

Point Tracking Visual Object Tracking

194

Paper
Code

A Probabilistic Framework for Color-Based Point Set Registration

no code implementations • CVPR 2016 • Martin Danelljan, Giulia Meneghetti, Fahad Shahbaz Khan, Michael Felsberg

On the Stanford Lounge dataset, our approach achieves a relative reduction of the failure rate by 78% compared to the baseline.

Paper
Add Code

Adaptive Color Attributes for Real-Time Visual Tracking

no code implementations • CVPR 2014 • Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg, Joost Van de Weijer

This paper investigates the contribution of color in a tracking-by-detection framework.

Attribute Object Recognition +1

Paper
Add Code

Discriminative Color Descriptors

no code implementations • CVPR 2013 • Rahat Khan, Joost Van de Weijer, Fahad Shahbaz Khan, Damien Muselet, Christophe Ducottet, Cecile Barat

This results in a drop of discriminative power of the color description.

Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.