Visual attention in multi-label image classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

One of the most significant challenges in multi-label image classification is the learning of representative features that capture the rich semantic information in a cluttered scene. As an information bottleneck, the visual attention mechanism allows humans to selectively process the most important visual input, enabling rapid and accurate scene understanding. In this work, we study the correlation between visual attention and multi-label image classification, and exploit an extra attention pathway for improving multi-label image classification performance. Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model and the other is a saliency prediction model trained with human fixations. Features computed with the two sub-networks are trained separately and then fine-tuned jointly using a multiple cross entropy loss. Experimental results show that the additional saliency sub-network improves multi-label image classification performance on the MS COCO dataset. The improvement is consistent across various levels of scene clutterness.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
PublisherIEEE Computer Society
Pages820-827
Number of pages8
ISBN (Electronic)9781728125060
DOIs
StatePublished - Jun 2019
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 - Long Beach, United States
Duration: Jun 16 2019Jun 20 2019

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume2019-June
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Country/TerritoryUnited States
CityLong Beach
Period6/16/196/20/19

Bibliographical note

Funding Information:
This research was funded by the NSF under Grants 1849107 and 1763761, and the University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ).

Fingerprint

Dive into the research topics of 'Visual attention in multi-label image classification'. Together they form a unique fingerprint.

Cite this