版权说明 操作指南
首页 > 成果 > 详情

VStyclone: Real-time Chinese voice style clone

认领
导出
Link by DOI
反馈
分享
QQ微信 微博
成果类型:
期刊论文
作者:
Wu, Yichun;Zhao, Huihuang;Liang, Xiaoman;Sun, Yaqi
通讯作者:
Zhao, Huihuang(happyday.huihuang@gmail.com)
作者机构:
[Liang, Xiaoman; Zhao, Huihuang; Sun, Yaqi; Wu, Yichun] Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421002, Peoples R China.
[Liang, Xiaoman; Zhao, Huihuang; Sun, Yaqi] Hunan Prov Key Lab Intelligent Informat Proc & App, Hengyang 421002, Peoples R China.
通讯机构:
[Huihuang Zhao] C
College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China<&wdkj&>Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang 421002, China
语种:
英文
关键词:
VStyclone;Voice clone;Efficient tone extractor;Style synthesizer;Transformer;Vocoder
期刊:
Computers & Electrical Engineering
ISSN:
0045-7906
年:
2023
卷:
105
页码:
108534
基金类别:
This work was supported by National Natural Science Foundation of China ( 621772179 ), Hunan Provincial Natural Science Foundation of China (2020JJ4152), the science and technology innovation Program of Hunan Province (2016TP1020).
机构署名:
本校为第一机构
院系归属:
计算机科学与技术学院
摘要:
This paper proposes a novel Chinese speech cloning model named VStyclone, which consists of three stages: multi-speaker training, target speaker encoding, and target speaker synthesis. In this work, we design an efficient tone extractor, which can reallocate resources to the sequences of log-mel spectrogram frames obtained from multiple speakers’ speech, thus allowing the network to learn multiple speakers’ features differently. This approach allows the network to focus more on the voice features of the target speaker and extract the target f...

反馈

验证码:
看不清楚,换一个
确定
取消

成果认领

标题:
用户 作者 通讯作者
请选择
请选择
确定
取消

提示

该栏目需要登录且有访问权限才可以访问

如果您有访问权限,请直接 登录访问

如果您没有访问权限,请联系管理员申请开通

管理员联系邮箱:yun@hnwdkj.com