While attention has been an increasingly popular component in deep neural networks to both interpret and boost performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attentions on their reasoning capability and how they impact task performance. Furthermore, we propose a supervision method to jointly and progressively optimize attention, reasoning, and task performance so that models learn to look at regions of interests by following a reasoning process. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR.
|Original language||English (US)|
|Title of host publication||Computer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings|
|Editors||Andrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm|
|Publisher||Springer Science and Business Media Deutschland GmbH|
|Number of pages||17|
|State||Published - 2020|
|Event||16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom|
Duration: Aug 23 2020 → Aug 28 2020
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||16th European Conference on Computer Vision, ECCV 2020|
|Period||8/23/20 → 8/28/20|
Bibliographical noteFunding Information:
This work is supported by NSF Grants 1908711 and 1849107.
- Eye-tracking dataset