Search Results for author: Guan Pang

Found 17 papers, 8 papers with code

SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

no code implementations • 3 Jun 2024 • Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan

While diffusion models have significantly advanced the quality of image generation, their capability to accurately and coherently render text within these images remains a substantial challenge.

Paper
Add Code

Animated Stickers: Bringing Stickers to Life with Video Diffusion

no code implementations • 8 Feb 2024 • David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu

Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.

Paper
Add Code

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

no code implementations • 6 Dec 2023 • Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu

Additionally, existing diffusion-based image manipulation models are sub-optimal in controlling the state transition of an action in egocentric image pixel space because of the domain gap.

Image Manipulation Language Modelling +1

Paper
Add Code

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

no code implementations • 25 Aug 2023 • Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds.

Machine Translation Optical Character Recognition +2

Paper
Add Code

Text-Conditional Contextualized Avatars For Zero-Shot Personalization

no code implementations • 14 Apr 2023 • Samaneh Azadi, Thomas Hayes, Akbar Shah, Guan Pang, Devi Parikh, Sonal Gupta

Recent large-scale text-to-image generation models have made significant improvements in the quality, realism, and diversity of the synthesized images and enable users to control the created content through language.

Text to 3D Text-to-Image Generation

Paper
Add Code

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

2 code implementations • 17 Apr 2022 • Thomas Hayes, Songyang Zhang, Xi Yin, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, Devi Parikh

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Navigate Retrieval +4

Paper
Code

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

1 code implementation • 7 Apr 2022 • Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh

Videos are created to express emotion, exchange information, and share experiences.

Ranked #15 on Video Generation on UCF-101

Video Generation

246

Paper
Code

Revisiting Linear Decision Boundaries for Few-Shot Learning with Transformer Hypernetworks

no code implementations • 29 Sep 2021 • Samrudhdhi Bharatkumar Rangrej, Kevin J Liang, Xi Yin, Guan Pang, Theofanis Karaletsos, Lior Wolf, Tal Hassner

Few-shot learning (FSL) methods aim to generalize a model to new unseen classes using only a small number of support examples.

Few-Shot Learning Image Classification

Paper
Add Code

TextStyleBrush: Transfer of Text Aesthetics from a Single Example

1 code implementation • 15 Jun 2021 • Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev, Tal Hassner

We present a novel approach for disentangling the content of a text image from all aspects of its appearance.

Disentanglement

109

Paper
Code

TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

no code implementations • CVPR 2021 • Amanpreet Singh, Guan Pang, Mandy Toh, Jing Huang, Wojciech Galuba, Tal Hassner

A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system.

Optical Character Recognition Optical Character Recognition (OCR) +2

Paper
Add Code

A Multiplexed Network for End-to-End, Multilingual OCR

1 code implementation • CVPR 2021 • Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner

Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results.

Optical Character Recognition (OCR) Text Detection

Paper
Code

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

2 code implementations • CVPR 2021 • Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, Tal Hassner

Tests on AFLW2000-3D and BIWI show that our method runs at real-time and outperforms state of the art (SotA) face pose estimators.

Ranked #6 on Head Pose Estimation on BIWI

3D Face Alignment Face Alignment +3

577

Paper
Code

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

1 code implementation • ECCV 2020 • Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.

Ranked #11 on Text Spotting on Total-Text

Region Proposal Text Spotting

616

Paper
Code

From Satellite Imagery to Disaster Insights

1 code implementation • 17 Dec 2018 • Jigar Doshi, Saikat Basu, Guan Pang

The use of satellite imagery has become increasingly popular for disaster monitoring and response.

Change Detection Disaster Response

Paper
Code

Improving Rotated Text Detection with Rotation Region Proposal Networks

no code implementations • 16 Nov 2018 • Jing Huang, Viswanath Sivakumar, Mher Mnatsakanyan, Guan Pang

In this work, we extend the scene-text extraction system at Facebook, Rosetta, to efficiently handle text in various orientations.

Misinformation Region Proposal +1

Paper
Add Code

DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images

1 code implementation • 17 May 2018 • Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, Ramesh Raskar

We present the DeepGlobe 2018 Satellite Image Understanding Challenge, which includes three public competitions for segmentation, detection, and classification tasks on satellite images.

340

Paper
Code

End-to-end Planning of Fixed Millimeter-Wave Networks

no code implementations • 20 May 2017 • Tim Danford, Onur Filiz, Jing Huang, Brian Karrer, Manohar Paluri, Guan Pang, Vish Ponnampalam, Nicolas Stier-Moses, Birce Tezel

This article discusses a framework to support the design and end-to-end planning of fixed millimeter-wave networks.

Combinatorial Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.