Search Results for author: Yuanzhe Chen

Found 11 papers, 1 papers with code

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

no code implementations • 27 Apr 2024 • Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks.

Retrieval

Paper
Add Code

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code implementations • 19 Jan 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Zhuo Chen, Lei Xie, Yuping Wang, Yuxuan Wang

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Language Modelling Voice Conversion

Paper
Add Code

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

no code implementations • 18 Jun 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang

An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.

Audio Generation Disentanglement +2

Paper
Add Code

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

Paper
Add Code

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

no code implementations • 12 Dec 2022 • Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang

The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity.

Data Augmentation Disentanglement

Paper
Add Code

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

no code implementations • 16 Nov 2022 • Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).

Voice Conversion

Paper
Add Code

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

no code implementations • 27 Oct 2022 • Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang

In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cloning one's voice using very limited data in the wild

no code implementations • 7 Oct 2021 • Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

(2) How to clone a person's voice while controlling the style and prosody.

Speech Synthesis

Paper
Add Code

Understanding Hidden Memories of Recurrent Neural Networks

1 code implementation • 30 Oct 2017 • Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu

We propose a technique to explain the function of individual hidden state units based on their expected response to input texts.

Clustering Sentence

176

Paper
Code

Intra-and-Inter-Constraint-based Video Enhancement based on Piecewise Tone Mapping

no code implementations • 21 Feb 2015 • Yuanzhe Chen, Weiyao Lin, Chongyang Zhang, Zhenzhong Chen, Ning Xu, Jun Xie

In this paper, we propose a new intra-and-inter-constraint-based video enhancement approach aiming to 1) achieve high intra-frame quality of the entire picture where multiple region-of-interests (ROIs) can be adaptively and simultaneously enhanced, and 2) guarantee the inter-frame quality consistencies among video frames.

Tone Mapping Video Enhancement

Paper
Add Code

A new network-based algorithm for human activity recognition in video

no code implementations • 21 Feb 2015 • Weiyao Lin, Yuanzhe Chen, Jianxin Wu, Hanli Wang, Bin Sheng, Hongxiang Li

Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network.

Activity Detection Activity Recognition In Videos +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.