Atention transfer from web images for video recognition

Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming. On the other hand, large amount of weakly-labeled images are uploaded to the Internet by users everyday. To harness the rich and highly diverse set of Web images, a scalable approach is to crawl these images to train deep learning based classifier, such as Convolutional Neural Networks (CNN). However, due to the domain shift problem, the performance of Web images trained deep classifiers tend to degrade when directly deployed to videos. One way to address this problem is to fine-tune the trained models on videos, but sufficient amount of annotated videos are still required. In this work, we propose a novel approach to transfer knowledge from image domain to video domain. The proposed method can adapt to the target domain (i.e. video data) with limited amount of training data. Our method maps the video frames into a low-dimensional feature space using the class-discriminative spatial attention map for CNNs. We design a novel Siamese EnergyNet structure to learn energy functions on the attention maps by jointly optimizing two loss functions, such that the attention map corresponding to a ground truth concept would have higher energy. We conduct extensive experiments on two challenging video recognition datasets (i.e. TVHI and UCF101), and demonstrate the efficacy of our proposed method.

Original languageEnglish (US)
Title of host publicationMM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages1-9
Number of pages9
ISBN (Electronic)9781450349062
DOIs
StatePublished - Oct 23 2017
Event25th ACM International Conference on Multimedia, MM 2017 - Mountain View, United States
Duration: Oct 23 2017Oct 27 2017

Publication series

NameMM 2017 - Proceedings of the 2017 ACM Multimedia Conference

Other

Other25th ACM International Conference on Multimedia, MM 2017
CountryUnited States
CityMountain View
Period10/23/1710/27/17

Bibliographical note

Funding Information:
This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centre in Singapore Funding Initiative.

Keywords

  • Action recognition
  • Attention map
  • Domain adaptation

Fingerprint Dive into the research topics of 'Atention transfer from web images for video recognition'. Together they form a unique fingerprint.

Cite this