Explicit Knowledge Incorporation for Visual Reasoning

Yifeng Zhang, Ming Jiang, Qi Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations


Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap between visual reasoning methods and the semantic complexity of real-world images, we present the first explicit visual reasoning method that incorporates external knowledge and models high-order relational attention for improved generalizability and explainability. Specifically, we propose a knowledge incorporation network that explicitly creates and includes new graph nodes for entities and predicates from external knowledge bases to enrich the semantics of the scene graph used in explicit reasoning. We then create a novel Graph-Relate module to perform high-order relational attention on the enriched scene graph. By explicitly introducing structured external knowledge and high-order relational attention, our method demonstrates significant generalizability and explainability over the state-of-the-art visual reasoning approaches on the GQA and VQAv2 datasets.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
PublisherIEEE Computer Society
Number of pages10
ISBN (Electronic)9781665445092
StatePublished - 2021
Event2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, United States
Duration: Jun 19 2021Jun 25 2021

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919


Conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Country/TerritoryUnited States
CityVirtual, Online

Bibliographical note

Funding Information:
This work is supported by NSF Grants 1908711.

Publisher Copyright:
© 2021 IEEE


Dive into the research topics of 'Explicit Knowledge Incorporation for Visual Reasoning'. Together they form a unique fingerprint.

Cite this