VStyclone: Real-time Chinese voice style clone

首页 > 成果 > 详情

认领

导出

Link by DOI

反馈

作者信息关键词期刊信息基础信息归属信息摘要

成果类型：

期刊论文

作者：

Wu, Yichun;Zhao, Huihuang;Liang, Xiaoman;Sun, Yaqi

通讯作者：

Zhao, Huihuang(happyday.huihuang@gmail.com)

作者机构：

[Liang, Xiaoman; Zhao, Huihuang; Sun, Yaqi; Wu, Yichun] Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421002, Peoples R China.

[Liang, Xiaoman; Zhao, Huihuang; Sun, Yaqi] Hunan Prov Key Lab Intelligent Informat Proc & App, Hengyang 421002, Peoples R China.

通讯机构：

[Huihuang Zhao] C

College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China<&wdkj&>Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang 421002, China

语种：

英文

关键词：

Efficient tone extractor;Style synthesizer;Transformer;Vocoder;Voice clone;VStyclone

期刊：

Computers & Electrical Engineering

ISSN：

0045-7906

年：

2023

卷：

105

页码：

108534

DOI：

10.1016/j.compeleceng.2022.108534

基金类别：

statement This work was supported by National Natural Science Foundation of China ( 621772179 ), Hunan Provincial Natural Science Foundation of China (2020JJ4152), the science and technology innovation Program of Hunan Province (2016TP1020).

机构署名：

本校为第一机构

院系归属：

计算机科学与技术学院

摘要：

This paper proposes a novel Chinese speech cloning model named VStyclone, which consists of three stages: multi-speaker training, target speaker encoding, and target speaker synthesis. In this work, we design an efficient tone extractor, which can reallocate resources to the sequences of log-mel spectrogram frames obtained from multiple speakers’ speech, thus allowing the network to learn multiple speakers’ features differently. This approach allows the network to focus more on the voice features of the target speaker and extract the target f...

反馈

产权有误：本人成果被他人认领

数据有误：数据基本信息有误

归属有误：成果的院系归属、机构署名归属有误

其他原因：

验证码：

看不清楚，换一个

确定

取消

成果认领

标题：

用户	作者	通讯作者	--
	请选择	请选择	--

确定

取消

VStyclone: Real-time Chinese voice style clone

反馈

成果认领

提示

该栏目需要登录且有访问权限才可以访问