no code implementations • 28 Sep 2023 • Manish Sharma, Moitreya Chatterjee, Kuan-Chuan Peng, Suhas Lohit, Michael Jones
We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality.
no code implementations • 6 Jun 2023 • Xiulong Liu, Sudipta Paul, Moitreya Chatterjee, Anoop Cherian
Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.
no code implementations • 29 Oct 2022 • Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian
In this paper, we propose to use this connection between audio and visual dynamics for solving two challenging tasks simultaneously, namely: (i) separating audio sources from a mixture using visual cues, and (ii) predicting the 3D visual motion of a sounding source using its separated audio.
no code implementations • ICCV 2021 • Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
no code implementations • ICCV 2021 • Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop Cherian
At its core, AVSGS uses a recursive neural network that emits mutually-orthogonal sub-graph embeddings of the visual graph using multi-head attention.
no code implementations • 1 Jan 2021 • Moitreya Chatterjee, Anoop Cherian, Narendra Ahuja
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
no code implementations • ECCV 2020 • Anoop Cherian, Moitreya Chatterjee, Narendra Ahuja
To tackle this problem, we present Sound2Sight, a deep variational framework, that is trained to learn a per frame stochastic prior conditioned on a joint embedding of audio and past frames.
no code implementations • 8 Jul 2020 • Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian
Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.
no code implementations • ECCV 2018 • Moitreya Chatterjee, Alexander G. Schwing
Paragraph generation from images, which has gained popularity recently, is an important task for video summarization, editing, and support of the disabled.
no code implementations • ECCV 2018 • Abhimanyu Dubey, Moitreya Chatterjee, Narendra Ahuja
We propose a novel Convolutional Neural Network (CNN) compression algorithm based on coreset representations of filters.
1 code implementation • NeurIPS 2016 • Arulkumar Subramaniam, Moitreya Chatterjee, Anurag Mittal
A novel inexact matching technique then matches pixels in the first representation with those of the second.
no code implementations • 27 Apr 2015 • Moitreya Chatterjee, Anton Leuski
Conventional multimedia annotation/retrieval systems such as Normalized Continuous Relevance Model (NormCRM) [16] require a fully labeled training data for a good performance.