Search Results for author: Yuchi Wang

Found 6 papers, 4 papers with code

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

1 code implementation • 24 May 2024 • Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu sun, Jiang Bian

Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable.

Paper
Code

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

1 code implementation • 16 Apr 2024 • Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu sun

Diffusion models have exhibited remarkable capabilities in text-to-image generation.

Ranked #8 on Image Captioning on COCO Captions (ROUGE-L metric)

Image Captioning Text Generation +1

Paper
Code

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

1 code implementation • 21 Feb 2024 • Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang

To address this, we introduce Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal embodied environments.

Autonomous Driving Decision Making

Paper
Code

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing

no code implementations • 20 Feb 2024 • Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian

Recent advances in text-guided video editing have showcased promising results in appearance editing (e. g., stylization).

Video Editing

Paper
Add Code

GAIA: Zero-shot Talking Avatar Generation

no code implementations • 26 Nov 2023 • Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian

Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image.

Paper
Add Code

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

1 code implementation • 3 Oct 2023 • Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Tianyu Liu, Baobao Chang

In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents.

Decision Making Language Modelling +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.