no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
In this study, we explore Transformer-based diffusion models for image and video generation.
no code implementations • 7 Dec 2023 • Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava
In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e. g. we see an improvement of 2. 13 Box AP and 1. 84 Mask AP over just training on real data on LVIS with Mask R-CNN.
no code implementations • 6 Dec 2023 • Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic
We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context.
no code implementations • 17 Nov 2023 • Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan
Evaluation results show our method improves visual quality by 14%, prompt alignment by 16. 2% and scene diversity by 15. 3%, compared to prompt engineering the base Emu model for stickers generation.
no code implementations • 26 Oct 2022 • Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, WenWen Jiang, Tao Xiang, Ning Zhang
Additionally, these works have mainly been restricted to multimodal understanding tasks.
no code implementations • 15 Feb 2022 • Licheng Yu, Jun Chen, Animesh Sinha, Mengjiao MJ Wang, Hugo Chen, Tamara L. Berg, Ning Zhang
We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc.
no code implementations • 24 May 2021 • Filip Radenovic, Animesh Sinha, Albert Gordo, Tamara Berg, Dhruv Mahajan
We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.
1 code implementation • 1 Apr 2021 • Animesh Sinha, Utkarsh Azad, Harjinder Singh
Near-term quantum hardware can support two-qubit operations only on the qubits that can interact with each other.