no code implementations • 26 Mar 2024 • Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori
Although there is a trade-off between the two models, both are applicable to 3D facial makeup estimation and related applications.
1 code implementation • 5 Jan 2024 • Yuta Okuyama, Yuki Endo, Yoshihiro Kanamori
Because this initial textured body model has artifacts due to occlusion and the inaccurate body shape, the rendered image undergoes a diffusion-based refinement, in which strong noise destroys body structure and identity whereas insufficient noise does not help.
1 code implementation • 11 Aug 2023 • Yuki Endo
To address this issue, we propose masked-attention guidance, which can generate images more faithful to semantic masks via indirect control of attention to each word and pixel by manipulating noise images fed to diffusion models.
no code implementations • 26 May 2023 • Takato Yoshikawa, Yuki Endo, Yoshihiro Kanamori
We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers.
1 code implementation • 26 Aug 2022 • Yuki Endo
In our framework, the user annotates a StyleGAN image with locations they want to move or not and specifies a movement direction by mouse dragging.
1 code implementation • 25 Jun 2021 • Yuki Endo, Yoshihiro Kanamori
To handle individual factors that determine object styles, we propose a class- and layer-wise extension to the variational autoencoder (VAE) framework that allows flexible control over each object class at the local to global levels by learning multiple latent spaces.
1 code implementation • 27 Mar 2021 • Yuki Endo, Yoshihiro Kanamori
This paper tackles a challenging problem of generating photorealistic images from semantic layouts in few-shot scenarios where annotated training pairs are hardly available but pixel-wise annotation is quite costly.
1 code implementation • 16 Oct 2019 • Yuki Endo, Yoshihiro Kanamori, Shigeru Kuriyama
Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models.
no code implementations • 7 Aug 2019 • Yoshihiro Kanamori, Yuki Endo
Based on supervised learning using convolutional neural networks (CNNs), we infer not only an albedo map, illumination but also a light transport map that encodes occlusion as nine SH coefficients per pixel.