Leveraging Human Attention in Novel Object Captioning

Xianyu Chen, Ming Jiang, Qi Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

Original languageEnglish (US)
Title of host publicationProceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
EditorsZhi-Hua Zhou
PublisherInternational Joint Conferences on Artificial Intelligence
Number of pages7
ISBN (Electronic)9780999241196
StatePublished - 2021
Event30th International Joint Conference on Artificial Intelligence, IJCAI 2021 - Virtual, Online, Canada
Duration: Aug 19 2021Aug 27 2021

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823


Conference30th International Joint Conference on Artificial Intelligence, IJCAI 2021
CityVirtual, Online

Bibliographical note

Funding Information:
This work is supported by NSF Grants 1908711.

Publisher Copyright:
© 2021 International Joint Conferences on Artificial Intelligence. All rights reserved.


Dive into the research topics of 'Leveraging Human Attention in Novel Object Captioning'. Together they form a unique fingerprint.

Cite this