Abstract
Object-centric representation is an essential abstraction for forward prediction. Most existing forward models learn this representation through extensive supervision (e.g., object class and bounding box) although such ground-truth information is not readily accessible in reality. To address this, we introduce KINet (Keypoint Interaction Network) - an end-to-end unsupervised framework to reason about object interactions based on a keypoint representation. Using visual observations, our model learns to associate objects with keypoint coordinates and discovers a graph representation of the system as a set of keypoint embeddings and their relations. It then learns an action-conditioned forward model using contrastive estimation to predict future keypoint states. By learning to perform physical reasoning in the keypoint space, our model automatically generalizes to scenarios with a different number of objects, novel backgrounds, and unseen object geometries. Experiments demonstrate the effectiveness of our model in accurately performing forward prediction and learning plannable object-centric representations for downstream robotic pushing manipulation tasks.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 6195-6202 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 8 |
| Issue number | 10 |
| DOIs | |
| State | Published - Oct 1 2023 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- Representation learning
- deep learning methods
- manipulation planning
Fingerprint
Dive into the research topics of 'KINet: Unsupervised Forward Models for Robotic Pushing Manipulation'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS