Hierarchical spatio-temporal context modeling for action recognition

Ju Sun, Xiao Wu, Shuicheng Yan, Loong Fah Cheong, Tat Seng Chua, Jintao Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

355 Scopus citations


The problem of recognizing actions in realistic videos is challenging yet absorbing owing to its great potentials in many practical applications. Most previous research is limited due to the use of simplified action databases under controlled environments or focus on excessively localized features without sufficiently encapsulating the spatiotemporal context. In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context (SIFT average descriptor), 2) intra-trajectory context (trajectory transition descriptor), and 3) inter-trajectory context (trajectory proximity descriptor). To obtain efficient and compact representations for the latter two levels, we encode the spatiotemporal context information into the transition matrix of a Markov process, and then extract its stationary distribution as the final context descriptor. Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance improvements over the state-ofthe- art results, respectively. We further propose to employ the Multiple Kernel Learning (MKL) technique to prune the kernels towards speedup in algorithm evaluation.

Original languageEnglish (US)
Title of host publication2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009
PublisherIEEE Computer Society
Number of pages8
ISBN (Print)9781424439935
StatePublished - 2009
Externally publishedYes
Event2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009 - Miami, FL, United States
Duration: Jun 20 2009Jun 25 2009

Publication series

Name2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009


Conference2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009
Country/TerritoryUnited States
CityMiami, FL


Dive into the research topics of 'Hierarchical spatio-temporal context modeling for action recognition'. Together they form a unique fingerprint.

Cite this