Boosted Attention: Leveraging Human Attention for Image Captioning

Shi Chen, Qi Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations


Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.

Original languageEnglish (US)
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsVittorio Ferrari, Cristian Sminchisescu, Yair Weiss, Martial Hebert
PublisherSpringer Verlag
Number of pages17
ISBN (Print)9783030012519
StatePublished - 2018
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: Sep 8 2018Sep 14 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11215 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other15th European Conference on Computer Vision, ECCV 2018

Bibliographical note

Funding Information:
Acknowledgements. This work is supported by NSF Grant 1763761 and University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ).

Publisher Copyright:
© 2018, Springer Nature Switzerland AG.


  • Human attention
  • Image captioning
  • Visual attention


Dive into the research topics of 'Boosted Attention: Leveraging Human Attention for Image Captioning'. Together they form a unique fingerprint.

Cite this