版权说明 帮助中心
首页 > 成果 > 详情

Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice

ESI热点ESI高被引SCI-EEI
WOS被引频次:70
认领
导出
Link by DOI
反馈
分享
QQ微信 微博
成果类型:
期刊论文
作者:
Peng, Xiaojiang;Wang, Limin;Wang, Xingxing;Qiao, Yu
通讯作者:
Peng, Xiaojiang(xiaojiang.peng@inria.fr)
作者机构:
[Peng, Xiaojiang] College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
[Peng, Xiaojiang] LEAR Team, INRIA, Grenoble, France
[Wang, Limin; Wang, Xingxing; Qiao, Yu] Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Computer Vision Lab, ETH Zurich, Zurich, Switzerland
通讯机构:
[Peng, Xiaojiang] Hengyang Normal Univ, Coll Comp Sci & Technol, Hengyang, Peoples R China.
[Peng, Xiaojiang] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China.
[Peng, Xiaojiang] INRIA, LEAR Team, Grenoble, France.
语种:
英文
关键词:
Action recognition;Bag of visual words;Fusion methods;Feature encoding
期刊:
Computer Vision and Image Understanding
ISSN:
1077-3142
年:
2015
卷:
150
页码:
109-125
文献类别:
WOS:Article;EI:Journal article (JA)
所属学科:
ESI学科类别:计算机科学;WOS学科类别:Computer Science, Artificial Intelligence;Engineering, Electrical & Electronic
入藏号:
基金类别:
Natural Science Foundation of China [61502152]; Guangdong Innovative Research Program [201001D0104648280, 20148050505017, 20158010129013]; Shenzhen Basic Research Program [KQCX2015033117354153, JCYJ20120903092050890, JCYJ20130402113127496]; Open Projects Program of National Laboratory of Pattern Recognition
机构署名:
本校为第一且通讯机构
院系归属:
计算机科学与技术学院
摘要:
Video based action recognition is one of the important and challenging problems in computer vision research. Bag of visual words model (BoVW) with local features has been very popular for a long time and obtained the state-of-the-art performance on several realistic datasets, such as the HMDB51, UCF50, and UCF101. BoVW is a general pipeline to construct a global representation from local features, which is mainly composed of five steps; (i) feature extraction, (ii) feature pre-processing, (iii) codebook generation, (iv) feature encoding, and (v) pooling and normalization. Although many efforts have been made in each step independently in different scenarios, their effects on action recognition are still unknown. Meanwhile, video data exhibits different views of visual patterns , such as static appearance and motion dynamics. Multiple descriptors are usually extracted to represent these different views. Fusing these descriptors is crucial for boosting the final performance of an action recognition system. This paper aims to provide a comprehensive study of all steps in BoVW and different fusion methods, and uncover some good practices to produce a state-of-the-art action recognition system. Specifically, we explore two kinds of local features, ten kinds of encoding methods, eight kinds of pooling and normalization strategies, and three kinds of fusion methods. We conclude that every step is crucial for contributing to the final recognition rate and improper choice in one of the steps may counteract the performance improvement of other steps. Furthermore, based on our comprehensive study, we propose a simple yet effective representation, called hybrid supervector, by exploring the complementarity of different BoVW frameworks with improved dense trajectories. Using this representation, we obtain impressive results on the three challenging datasets; HMDB51 (61.9%), UCF50 (92.3%), and UCF101 (87.9%).
参考文献:
Aggarwal JK, 2011, ACM COMPUT SURV, V43, DOI 10.1145/1922649.1922653
Arandjelovic R, 2013, PROC CVPR IEEE, P1578, DOI 10.1109/CVPR.2013.207
Bishop C. M., 2006, PATTERN RECOGNITION
Boureau YL, 2010, INT C MACH LEARN, P111
Bruckstein AM, 2009, SIAM REV, V51, P34, DOI 10.1137/060657704

反馈

验证码:
看不清楚,换一个
确定
取消

成果认领

标题:
用户 作者 通讯作者
请选择
请选择
确定
取消

提示

该栏目需要登录且有访问权限才可以访问

如果您有访问权限,请直接 登录访问

如果您没有访问权限,请联系管理员申请开通

管理员联系邮箱:yun@hnwdkj.com