no code implementations • 3 Jun 2024 • Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan
While diffusion models have significantly advanced the quality of image generation, their capability to accurately and coherently render text within these images remains a substantial challenge.
no code implementations • 8 Feb 2024 • David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu
Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.
no code implementations • 6 Dec 2023 • Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu
Additionally, existing diffusion-based image manipulation models are sub-optimal in controlling the state transition of an action in egocentric image pixel space because of the domain gap.
no code implementations • 25 Aug 2023 • Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu
This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds.
no code implementations • 14 Apr 2023 • Samaneh Azadi, Thomas Hayes, Akbar Shah, Guan Pang, Devi Parikh, Sonal Gupta
Recent large-scale text-to-image generation models have made significant improvements in the quality, realism, and diversity of the synthesized images and enable users to control the created content through language.
2 code implementations • 17 Apr 2022 • Thomas Hayes, Songyang Zhang, Xi Yin, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, Devi Parikh
Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.
1 code implementation • 7 Apr 2022 • Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh
Videos are created to express emotion, exchange information, and share experiences.
Ranked #15 on Video Generation on UCF-101
no code implementations • 29 Sep 2021 • Samrudhdhi Bharatkumar Rangrej, Kevin J Liang, Xi Yin, Guan Pang, Theofanis Karaletsos, Lior Wolf, Tal Hassner
Few-shot learning (FSL) methods aim to generalize a model to new unseen classes using only a small number of support examples.
1 code implementation • 15 Jun 2021 • Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev, Tal Hassner
We present a novel approach for disentangling the content of a text image from all aspects of its appearance.
no code implementations • CVPR 2021 • Amanpreet Singh, Guan Pang, Mandy Toh, Jing Huang, Wojciech Galuba, Tal Hassner
A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system.
Optical Character Recognition Optical Character Recognition (OCR) +2
1 code implementation • CVPR 2021 • Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner
Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results.
2 code implementations • CVPR 2021 • Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, Tal Hassner
Tests on AFLW2000-3D and BIWI show that our method runs at real-time and outperforms state of the art (SotA) face pose estimators.
Ranked #6 on Head Pose Estimation on BIWI
1 code implementation • ECCV 2020 • Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai
Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.
Ranked #11 on Text Spotting on Total-Text
1 code implementation • 17 Dec 2018 • Jigar Doshi, Saikat Basu, Guan Pang
The use of satellite imagery has become increasingly popular for disaster monitoring and response.
no code implementations • 16 Nov 2018 • Jing Huang, Viswanath Sivakumar, Mher Mnatsakanyan, Guan Pang
In this work, we extend the scene-text extraction system at Facebook, Rosetta, to efficiently handle text in various orientations.
1 code implementation • 17 May 2018 • Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, Ramesh Raskar
We present the DeepGlobe 2018 Satellite Image Understanding Challenge, which includes three public competitions for segmentation, detection, and classification tasks on satellite images.
no code implementations • 20 May 2017 • Tim Danford, Onur Filiz, Jing Huang, Brian Karrer, Manohar Paluri, Guan Pang, Vish Ponnampalam, Nicolas Stier-Moses, Birce Tezel
This article discusses a framework to support the design and end-to-end planning of fixed millimeter-wave networks.