no code implementations • 11 Apr 2024 • Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin
Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).
no code implementations • CVPR 2023 • Jishnu Mukhoti, Tsung-Yu Lin, Omid Poursaeed, Rui Wang, Ashish Shah, Philip H. S. Torr, Ser-Nam Lim
We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder.
no code implementations • 24 Sep 2022 • Jishnu Mukhoti, Tsung-Yu Lin, Bor-Chun Chen, Ashish Shah, Philip H. S. Torr, Puneet K. Dokania, Ser-Nam Lim
In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data.
Out-of-Distribution Detection Out of Distribution (OOD) Detection +2
no code implementations • 29 Sep 2021 • Ze Wang, Yipin Zhou, Rui Wang, Tsung-Yu Lin, Ashish Shah, Ser-Nam Lim
Anything outside of a given normal population is by definition an anomaly.
no code implementations • 2 Jul 2019 • Tsung-Yu Lin, Mikayla Timm, Chenyun Wu, Subhransu Maji
We analyze how categories from recent FGVC challenges can be described by their textural content.
no code implementations • ECCV 2018 • Tsung-Yu Lin, Subhransu Maji, Piotr Koniusz
In this paper, we study a class of orderless aggregation functions designed to minimize interference or equalize contributions in the context of second-order features and we show that they can be computed just as efficiently as their first-order counterparts and they have favorable properties over aggregation by summation.
no code implementations • 21 Jul 2017 • Tsung-Yu Lin, Subhransu Maji
We present an alternative scheme for computing gradients that is faster and yet it offers improvements over the baseline model.
no code implementations • 1 Dec 2015 • Tsung-Yu Lin, Tsung-Wei Ke, Tyng-Luh Liu
We address the problem of converting large-scale high-dimensional image data into binary codes so that approximate nearest-neighbor search over them can be efficiently performed.
no code implementations • ICCV 2015 • Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor.
Ranked #62 on Fine-Grained Image Classification on CUB-200-2011
Fine-Grained Image Classification Fine-Grained Visual Recognition
no code implementations • CVPR 2016 • Tsung-Yu Lin, Subhransu Maji
A number of recent approaches have used deep convolutional neural networks (CNNs) to build texture representations.
no code implementations • 3 Jun 2015 • Aruni RoyChowdhury, Tsung-Yu Lin, Subhransu Maji, Erik Learned-Miller
We demonstrate the performance of the B-CNN model beginning from an AlexNet-style network pre-trained on ImageNet.
4 code implementations • 29 Apr 2015 • Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji
We then present a systematic analysis of these networks and show that (1) the bilinear features are highly redundant and can be reduced by an order of magnitude in size without significant loss in accuracy, (2) are also effective for other image classification tasks such as texture and scene recognition, and (3) can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture.
Ranked #23 on Fine-Grained Image Classification on NABirds
Fine-Grained Image Classification Fine-Grained Visual Recognition +1