This paper presents a method to predict social saliency, the likelihood of joint attention, given an input image or video by leveraging the social interaction data captured by first person cameras. Inspired by electric dipole moments, we introduce a social formation feature that encodes the geometric relationship between joint attention and its social formation. We learn this feature from the first person social interaction data where we can precisely measure the locations of joint attention and its associated members in 3D. An ensemble classifier is trained to learn the geometric relationship. Using the trained classifier, we predict social saliency in real-world scenes with multiple social groups including scenes from team sports captured in a third person view. Our representation does not require directional measurements such as gaze directions. A geometric analysis of social interactions in terms of the F-formation theory is also presented.
|Original language||English (US)|
|Title of host publication||IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015|
|Publisher||IEEE Computer Society|
|Number of pages||9|
|State||Published - Oct 14 2015|
|Event||IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 - Boston, United States|
Duration: Jun 7 2015 → Jun 12 2015
|Name||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|Other||IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015|
|Period||6/7/15 → 6/12/15|
Bibliographical notePublisher Copyright:
© 2015 IEEE.