Search Results for author: Animesh Sinha

Found 8 papers, 1 papers with code

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua

In this study, we explore Transformer-based diffusion models for image and video generation.

Text-to-Video Generation Video Generation

Paper
Add Code

Gen2Det: Generate to Detect

no code implementations • 7 Dec 2023 • Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava

In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e. g. we see an improvement of 2. 13 Box AP and 1. 84 Mask AP over just training on real data on LVIS with Mask R-CNN.

Image Generation Object +2

Paper
Add Code

Context Diffusion: In-Context Aware Image Generation

no code implementations • 6 Dec 2023 • Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context.

Image Generation In-Context Learning

Paper
Add Code

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

no code implementations • 17 Nov 2023 • Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

Evaluation results show our method improves visual quality by 14%, prompt alignment by 16. 2% and scene diversity by 15. 3%, compared to prompt engineering the base Emu model for stickers generation.

Image Generation Prompt Engineering

Paper
Add Code

FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning

no code implementations • 26 Oct 2022 • Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, WenWen Jiang, Tao Xiang, Ning Zhang

Additionally, these works have mainly been restricted to multimodal understanding tasks.

Cross-Modal Retrieval Decoder +4

Paper
Add Code

CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval

no code implementations • 15 Feb 2022 • Licheng Yu, Jun Chen, Animesh Sinha, Mengjiao MJ Wang, Hugo Chen, Tamara L. Berg, Ning Zhang

We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc.

Representation Learning Retrieval +1

Paper
Add Code

Large-Scale Attribute-Object Compositions

no code implementations • 24 May 2021 • Filip Radenovic, Animesh Sinha, Albert Gordo, Tamara Berg, Dhruv Mahajan

We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.

Attribute Object

Paper
Add Code

Qubit Routing using Graph Neural Network aided Monte Carlo Tree Search

1 code implementation • 1 Apr 2021 • Animesh Sinha, Utkarsh Azad, Harjinder Singh

Near-term quantum hardware can support two-qubit operations only on the qubits that can interact with each other.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.