TY - JOUR
T1 - Attention in Reasoning
T2 - Dataset, Analysis, and Modeling
AU - Chen, Shi
AU - Jiang, Ming
AU - Yang, Jinhui
AU - Zhao, Qi
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2022/11/1
Y1 - 2022/11/1
N2 - While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of attention progressively along the reasoning process and to differentiate the correct and incorrect attention patterns. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR.
AB - While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of attention progressively along the reasoning process and to differentiate the correct and incorrect attention patterns. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR.
KW - Attention
KW - eye-tracking dataset
KW - reasoning
UR - http://www.scopus.com/inward/record.url?scp=85115683750&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115683750&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3114582
DO - 10.1109/TPAMI.2021.3114582
M3 - Article
C2 - 34550881
AN - SCOPUS:85115683750
SN - 0162-8828
VL - 44
SP - 7310
EP - 7326
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 11
ER -