LS-GAN: Iterative Language-based Image Manipulation via Long and Short Term Consistency Reasoning

Iterative language-based image manipulation aims to edit images step by step according to user's linguistic instructions. The existing methods mostly focus on aligning the attributes and appearance of new-added visual elements with current instruction. However, they fail to maintain consistency between instructions and images as iterative rounds increase. To address this issue, we propose a novel Long and Short term consistency reasoning Generative Adversarial Network (LS-GAN), which enhances the awareness of previous objects with current instruction and better maintains the consistency with the user's intent under the continuous iterations. Specifically, we first design a Context-aware Phrase Encoder (CPE) to learn the user's intention by extracting different phrase-level information about the instruction. Further, we introduce a Long and Short term Consistency Reasoning (LSCR) mechanism. The long-term reasoning improves the model on semantic understanding and positional reasoning, while short-term reasoning ensures the ability to construct visual scenes based on linguistic instructions. Extensive results show that LS-GAN improves the generation quality in terms of both object identity and position, and achieves the state-of-the-art performance on two public datasets.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here