Learning to Predict Sequences of Human Visual Fixations

Ming Jiang, Xavier Boix, Gemma Roig, Juan Xu, Luc Van Gool, Qi Zhao

Research output: Contribution to journalArticlepeer-review

43 Scopus citations


Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.

Original languageEnglish (US)
Article number7374716
Pages (from-to)1241-1252
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Issue number6
StatePublished - Jun 2016

Bibliographical note

Publisher Copyright:
© 2016 IEEE.


  • Scanpath prediction
  • Visual saliency prediction


Dive into the research topics of 'Learning to Predict Sequences of Human Visual Fixations'. Together they form a unique fingerprint.

Cite this