Abstract
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of dailylife activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
Publisher | IEEE Computer Society |
Pages | 18973-18990 |
Number of pages | 18 |
ISBN (Electronic) | 9781665469463 |
DOIs | |
State | Published - 2022 |
Event | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States Duration: Jun 19 2022 → Jun 24 2022 |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
Volume | 2022-June |
ISSN (Print) | 1063-6919 |
Conference
Conference | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
---|---|
Country/Territory | United States |
City | New Orleans |
Period | 6/19/22 → 6/24/22 |
Bibliographical note
Funding Information:UNICT is supported by MIUR AIM - Attrazione e MobilitaIn-ternazionale Linea 1 - AIM1893589 - CUP E64118002540007. Bristol is supported by UKRIEngineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EP-SRC Fellowship UMPIRE (EP/T004991/1). KAUST is supported by the KAUST Office of Sponsored Research through the Visual Computing Center (VCC) funding. National University of Singapore is supported by Mike Shou’s Start-Up Grant. Georgia Tech is supported in part by NSF 2033413 and NIH R01MH114999.
Publisher Copyright:
© 2022 IEEE.
Keywords
- Datasets and evaluation
- Video analysis and understanding