A novel deep convolution neural network is proposed to predict gaze on current frames in egocentric videos. Inspired by human visual system, we introduce a fovea module responsible for sharp central vision and name our model as Foveated Neural Network (FNN). The retina-like visual inputs from the region of interest on the previous frame are analysed and encoded. The fusion of the hidden representations of the previous frame and the feature maps of the current frame guides the gaze prediction on the current frame. In order to simulate motion, we also include the dense optical flow between these adjacent frames as additional input. Experimental results show that FNN outperforms the state-of-the-art algorithms in the publicly available egocentric dataset. The analysis of FNN demonstrates that the hidden representations of the foveated visual input from the previous frame as well as the motion information between adjacent frames are efficient in improving gaze prediction performance in egocentric videos.
|Original language||English (US)|
|Title of host publication||2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings|
|Publisher||IEEE Computer Society|
|Number of pages||5|
|State||Published - Feb 20 2018|
|Event||24th IEEE International Conference on Image Processing, ICIP 2017 - Beijing, China|
Duration: Sep 17 2017 → Sep 20 2017
|Name||Proceedings - International Conference on Image Processing, ICIP|
|Other||24th IEEE International Conference on Image Processing, ICIP 2017|
|Period||9/17/17 → 9/20/17|
Bibliographical noteFunding Information:
This work was supported by the Reverse Engineering Visual Intelligence for cognitive Enhancement (REVIVE) programme funded by the Joint Council Office of A*STAR, Singapore.
© 2017 IEEE.
Copyright 2018 Elsevier B.V., All rights reserved.
- Egocentric Videos
- Visual Attention