TY - JOUR
T1 - Talk Through It
T2 - End User Directed Manipulation Learning
AU - Winge, Carl
AU - Imdieke, Adam
AU - Aldeeb, Bahaa
AU - Kang, Dongyeop
AU - Desingh, Karthik
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2024
Y1 - 2024
N2 - Training robots to perform a huge range of tasks in many different environments is immensely difficult. Instead, we propose selectively training robots based on end-user preferences. Given a factory model that lets an end user instruct a robot to perform lower-level actions (e.g. 'Move left'), we show that end users can collect demonstrations using language to train their home model for higher-level tasks specific to their needs (e.g. 'Open the top drawer and put the block inside'). We demonstrate this framework on robot manipulation tasks using RLBench environments. Our method results in a 13% improvement in task success rates compared to a baseline method. We also explore the use of the large vision-language model (VLM), Bard, to automatically break down tasks into sequences of lower-level instructions, aiming to bypass end-user involvement. The VLM is unable to break tasks down to our lowest level, but does achieve good results breaking high-level tasks into mid-level skills.
AB - Training robots to perform a huge range of tasks in many different environments is immensely difficult. Instead, we propose selectively training robots based on end-user preferences. Given a factory model that lets an end user instruct a robot to perform lower-level actions (e.g. 'Move left'), we show that end users can collect demonstrations using language to train their home model for higher-level tasks specific to their needs (e.g. 'Open the top drawer and put the block inside'). We demonstrate this framework on robot manipulation tasks using RLBench environments. Our method results in a 13% improvement in task success rates compared to a baseline method. We also explore the use of the large vision-language model (VLM), Bard, to automatically break down tasks into sequences of lower-level instructions, aiming to bypass end-user involvement. The VLM is unable to break tasks down to our lowest level, but does achieve good results breaking high-level tasks into mid-level skills.
KW - Human-centered robotics
KW - incremental learning
KW - learning from demonstration
UR - http://www.scopus.com/inward/record.url?scp=85199538997&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85199538997&partnerID=8YFLogxK
U2 - 10.1109/LRA.2024.3433309
DO - 10.1109/LRA.2024.3433309
M3 - Article
AN - SCOPUS:85199538997
SN - 2377-3766
VL - 9
SP - 8051
EP - 8058
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 9
ER -