Unsupervised Learning of View-invariant Action Representations

Junnan Li, Qi Zhao, Yongkang Wong, Mohan S. Kankanhalli

Research output: Contribution to journalConference articlepeer-review

62 Scopus citations


The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an expensive and time-consuming process. In this work, we propose an unsupervised learning framework, which exploits unlabeled data to learn video representations. Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view. By learning to extrapolate cross-view motions, the representation can capture view-invariant motion dynamics which is discriminative for the action. In addition, we propose a view-adversarial training method to enhance learning of view-invariant features. We demonstrate the effectiveness of the learned representations for action recognition on multiple datasets.

Original languageEnglish (US)
Pages (from-to)1254-1264
Number of pages11
JournalAdvances in Neural Information Processing Systems
StatePublished - 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

Bibliographical note

Funding Information:
This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Strategic Capability Research Centres Funding Initiative.

Publisher Copyright:
© 2018 Curran Associates Inc..All rights reserved.

Copyright 2019 Elsevier B.V., All rights reserved.


Dive into the research topics of 'Unsupervised Learning of View-invariant Action Representations'. Together they form a unique fingerprint.

Cite this