TellMeTalk: Multimodal-driven talking face video generation

首页 > 成果 > 详情

认领

导出

Link by DOI

反馈

作者信息关键词期刊信息基础信息归属信息摘要

成果类型：

期刊论文

作者：

Li, Pengfei;Zhao, Huihuang;Liu, Qingyun;Tang, Peng;Zhang, Lin

通讯作者：

Zhao, HH

作者机构：

[Tang, Peng; Zhao, Huihuang; Zhao, HH; Li, Pengfei; Zhang, Lin; Liu, Qingyun] Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421002, Hunan, Peoples R China.

[Zhao, Huihuang; Liu, Qingyun] Hunan Prov Key Lab Intelligent Informat Proc & App, Changsha 421008, Hunan, Peoples R China.

[Zhao, Huihuang] Natl Engn Lab Robot Visual Percept & Control Tech, Changsha 410082, Hunan, Peoples R China.

通讯机构：

[Zhao, HH ] H

Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang 421002, Hunan, Peoples R China.

语种：

英文

关键词：

Talking face generation;Lip sync;Face motion;Virtual reality;Multimodality

期刊：

Computers & Electrical Engineering

ISSN：

0045-7906

年：

2024

卷：

114

页码：

109049

DOI：

10.1016/j.compeleceng.2023.109049

基金类别：

National Natural Science Foundation of China [61772179]; Hunan Provincial Natural Science Foundation of China [2022JJ50016, 2023JJ50095]; Science and Technology Plan Project of Hunan Province [2016TP1020]; Scientific Research Fund of Hunan Provincial Education Department [21B0649]; Postgraduate Scientific Research Innovation Project of Hunan Province [CX20221285]; Science and Technology Innovation Project of Hengyang [202250045231]; Industry University Research Innovation Foundation of Ministry of Education Science and Technology Development Center [2020QT09]; The 14th Five-Year Plan Key Disciplines and Application-oriented Special Disciplines of Hunan Province (Xiangjiaotong) [[2022] 351]

机构署名：

本校为第一且通讯机构

院系归属：

计算机科学与技术学院

摘要：

In this paper, we present TellMeTalk, an innovative approach for generating expressive talking face videos based on multimodal inputs. Our approach demonstrates robustness across various identities, languages, expressions, and head movements. It overcomes four key limitations of existing talking face video generation methods: (1) reliance on single -modal learning from audio or text, lacking the complementary nature of multimodal inputs; (2) deployment of traditional convolutional neural network generation, leading to restricted capture of spatial features; (3) the absence of natural head move...

反馈

产权有误：本人成果被他人认领

数据有误：数据基本信息有误

归属有误：成果的院系归属、机构署名归属有误

其他原因：

验证码：

看不清楚，换一个

确定

取消

成果认领

标题：

用户	作者	通讯作者	--
	请选择	请选择	--

确定

取消

TellMeTalk: Multimodal-driven talking face video generation

反馈

成果认领

提示

该栏目需要登录且有访问权限才可以访问