期刊:
Lecture Notes in Computer Science,2014年8691 LNCS(PART 3):660-674 ISSN:0302-9743
通讯作者:
Peng, Xiaojiang
作者机构:
[Peng, Qiang; Peng, Xiaojiang] Southwest Jiaotong Univ, Chengdu, Peoples R China.;[Wang, Limin] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Peoples R China.;[Qiao, Yu; Wang, Limin; Peng, Xiaojiang] Chinese Acad Sci, Shenzhen Key Lab CVPR, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China.;[Wang, Limin] Hengyang Normal Univ, Hengyang, Peoples R China.
通讯机构:
[Peng, Xiaojiang] S;Southwest Jiaotong Univ, Chengdu, Peoples R China.
会议名称:
European Conference on Computer Vision
关键词:
Artificial intelligence;Computer science;Computers;Classification tasks;Dictionary learning;Dictionary learning algorithms;Efficient computation;High order statistics;Image-based objects;State-of-the-art performance;Vector of locally aggregated descriptors;Aggregates
摘要:
Recent studies show that aggregating local descriptors into super vector yields effective representation for retrieval and classification tasks. A popular method along this line is vector of locally aggregated descriptors (VLAD), which aggregates the residuals between descriptors and visual words. However, original VLAD ignores high-order statistics of local descriptors and its dictionary may not be optimal for classification tasks. In this paper, we address these problems by utilizing high-order statistics of local descriptors and peforming supervised dictionary learning. The main contributions are twofold. Firstly, we propose a high-order VLAD (H-VLAD) for visual recognition, which leverages two kinds of high-order statistics in the VLAD-like framework, namely diagonal covariance and skewness. These high-order statistics provide complementary information for VLAD and allow for efficient computation. Secondly, to further boost the performance of H-VLAD, we design a supervised dictionary learning algorithm to discriminatively refine the dictionary, which can be also extended for other super vector based encoding methods. We examine the effectiveness of our methods in image-based object categorization and video-based action recognition. Extensive experiments on PASCAL VOC 2007, HMDB51, and UCF101 datasets exhibit that our method achieves the state-of-the-art performance on both tasks.