Force from Motion: Decoding Control Force of Activity in a First Person Video

Hyun Park, Jianbo Shi

Research output: Contribution to journalArticle

Abstract

A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.

Original languageEnglish (US)
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Force Control
Force control
Decoding
Person
Cameras
Motion
Gravitation
Dynamical systems
Torque
Semantics
Parachutes
Optical flows
Angular momentum
Control theory
Camera
Mechanics
Trajectories
Dynamical system
Recovery
Action Recognition

Keywords

  • Cameras
  • Dynamics
  • First-person Vision
  • Gravity
  • Optimal Control
  • Physical Sensation
  • Three-dimensional displays
  • Torque
  • Visualization

PubMed: MeSH publication types

  • Journal Article

Cite this

@article{195c40e547694e4ab3d416517543e585,
title = "Force from Motion: Decoding Control Force of Activity in a First Person Video",
abstract = "A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.",
keywords = "Cameras, Dynamics, First-person Vision, Gravity, Optimal Control, Physical Sensation, Three-dimensional displays, Torque, Visualization",
author = "Hyun Park and Jianbo Shi",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TPAMI.2018.2883327",
language = "English (US)",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
issn = "0162-8828",
publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Force from Motion

T2 - Decoding Control Force of Activity in a First Person Video

AU - Park, Hyun

AU - Shi, Jianbo

PY - 2018/1/1

Y1 - 2018/1/1

N2 - A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.

AB - A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.

KW - Cameras

KW - Dynamics

KW - First-person Vision

KW - Gravity

KW - Optimal Control

KW - Physical Sensation

KW - Three-dimensional displays

KW - Torque

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=85057379628&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057379628&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2018.2883327

DO - 10.1109/TPAMI.2018.2883327

M3 - Article

C2 - 30489262

AN - SCOPUS:85057379628

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

ER -