REX: Reasoning-aware and Grounded Explanation

Shi Chen, Qi Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially ex-ecute different reasoning steps and construct a new dataset with 1,040,830 multi-modal explanations. Second, we iden-tify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions, and propose a novel explanation generation method that explicitly models the pairwise correspon-dence between words and regions of interest. It improves the visual grounding capability by a considerable margin, resulting in enhanced interpretability and reasoning performance. Finally, with our new data and method, we perform extensive analyses to study the effectiveness of our explanation under different settings, including multi-task learning and transfer learning. Our code and data are available at https://github.com/szzexpoi/rex.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE Computer Society
Pages15565-15574
Number of pages10
ISBN (Electronic)9781665469463
DOIs
StatePublished - 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
Duration: Jun 19 2022Jun 24 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUnited States
CityNew Orleans
Period6/19/226/24/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • Vision + language
  • Visual reasoning

Fingerprint

Dive into the research topics of 'REX: Reasoning-aware and Grounded Explanation'. Together they form a unique fingerprint.

Cite this