Luo, Sheng; Yang, Haojin; Wang, Cheng; Che, Xiaoyin; Meinel, Christoph
Artificial Neural Networks and Machine Learning – ICANN 2016
With significant increasing of surveillance cameras, the amount of surveillance videos is growing rapidly. Thereby how to automatically and efficiently recognize semantic actions and events in surveillance videos becomes an important problem to be addressed. In this paper, we investigate the state-of-the-art Deep Learning (DL) approaches for human action recognition, and propose an improved two-stream ConvNets architecture for this task. In particular, we propose to use Motion History Image (MHI) as motion expression for training the temporal ConvNet, which achieved impressive results in both accuracy and recognition speed. In our experiment, we conducted an in-depth study to investigate important network options and compared to the latest deep network for action recognition. The detailed evaluation results show the superior ability of our proposed approach, which achieves state-of-the-art in surveillance video context.