中国科学院深圳先进技术研究院机构知识库(SIAT OpenIR): Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice
SIAT OpenIR  > 集成所
Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice
Peng, Xiaojiang; Wang, Limin; Wang, Xingxing; Qiao, Yu
2016
Source PublicationCOMPUTER VISION AND IMAGE UNDERSTANDING
Subtype期刊论文
AbstractVideo based action recognition is one of the important and challenging problems in computer vision research. Bag of visual words model (BoVW) with local features has been very popular for a long time and obtained the state-of-the-art performance on several realistic datasets, such as the HMDB51, UCF50, and UCF101. BoVW is a general pipeline to construct a global representation from local features, which is mainly composed of five steps; (i) feature extraction, (ii) feature pre-processing, (iii) codebook generation, (iv) feature encoding, and (v) pooling and normalization. Although many efforts have been made in each step independently in different scenarios, their effects on action recognition are still unknown. Meanwhile, video data exhibits different views of visual patterns, such as static appearance and motion dynamics. Multiple descriptors are usually extracted to represent these different views. Fusing these descriptors is crucial for boosting the final performance of an action recognition system. This paper aims to provide a comprehensive study of all steps in BoVW and different fusion methods, and uncover some good practices to produce a state-of-the-art action recognition system. Specifically, we explore two kinds of local features, ten kinds of encoding methods, eight kinds of pooling and normalization strategies, and three kinds of fusion methods. We conclude that every step is crucial for contributing to the final recognition rate and improper choice in one of the steps may counteract the performance improvement of other steps. Furthermore, based on our comprehensive study, we propose a simple yet effective representation, called hybrid supervector, by exploring the complementarity of different BoVW frameworks with improved dense trajectories. Using this representation, we obtain impressive results on the three challenging datasets; HMDB51 (61.9%), UCF50 (92.3%), and UCF101 (87.9%). (C) 2016 Elsevier Inc. All rights reserved.
URL查看原文
Indexed BySCI
Language英语
Department多媒体集成技术研究中心
Document Type期刊论文
Identifierhttp://ir.siat.ac.cn/handle/172644/9801
Collection集成所
AffiliationCOMPUTER VISION AND IMAGE UNDERSTANDING
Recommended Citation
GB/T 7714
Peng, Xiaojiang,Wang, Limin,Wang, Xingxing,et al. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[J]. COMPUTER VISION AND IMAGE UNDERSTANDING,2016.
APA Peng, Xiaojiang,Wang, Limin,Wang, Xingxing,&Qiao, Yu.(2016).Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice.COMPUTER VISION AND IMAGE UNDERSTANDING.
MLA Peng, Xiaojiang,et al."Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice".COMPUTER VISION AND IMAGE UNDERSTANDING (2016).
Files in This Item: Download All
File Name/Size DocType Version Access License
集成-多媒体2016006.pdf(2737KB) 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Peng, Xiaojiang]'s Articles
[Wang, Limin]'s Articles
[Wang, Xingxing]'s Articles
Baidu academic
Similar articles in Baidu academic
[Peng, Xiaojiang]'s Articles
[Wang, Limin]'s Articles
[Wang, Xingxing]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Peng, Xiaojiang]'s Articles
[Wang, Limin]'s Articles
[Wang, Xingxing]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 集成-多媒体2016006.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.