Search Results for author: Se Jin Park

Found 10 papers, 3 papers with code

Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

no code implementations • 7 Mar 2024 • Seunghee Han, Se Jin Park, Chae Won Kim, Yong Man Ro

We devise completeness loss and consistency loss based on semantic similarity scores.

Paper
Add Code

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

no code implementations • 18 Jan 2024 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

Sentence speech-recognition +1

Paper
Add Code

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

1 code implementation • 5 Dec 2023 • Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro

To mitigate the problem of the absence of a parallel AV2AV translation dataset, we propose to train our spoken language translation system with the audio-only dataset of A2A.

Self-Supervised Learning Speech-to-Speech Translation +1

Paper
Code

DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion

no code implementations • 23 Aug 2023 • Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro

We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh.

3D Face Animation

Paper
Add Code

Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models

no code implementations • 28 Jun 2023 • Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

The visual speaker embedding is derived from a single target face image and enables improved mapping of input text to the learned audio latent space by incorporating the speaker characteristics inherent in the audio.

Face Generation

Paper
Add Code

Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

no code implementations • 31 May 2023 • Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro

The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion.

Talking Face Generation

Paper
Add Code

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

no code implementations • 2 Nov 2022 • Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time.

Audio-Visual Synchronization Representation Learning +1

Paper
Add Code

Test-time Adaptation for Real Image Denoising via Meta-transfer Learning

no code implementations • 5 Jul 2022 • Agus Gunawan, Muhammad Adi Nugroho, Se Jin Park

We explore a different direction where we propose to improve real image denoising performance through a better learning strategy that can enable test-time adaptation on the multi-task network.

Auxiliary Learning Image Denoising +2

Paper
Add Code

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

1 code implementation • ICCV 2021 • Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.

Ranked #3 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 • Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro

Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.

Ranked #1 on Speaker-Specific Lip to Speech Synthesis on GRID corpus (mixed-speech)

Speaker-Specific Lip to Speech Synthesis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.