Hybrid Gradient-Based Policy Optimization for Sample-Efficient Policy Learning in Autonomous Systems

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces HyGIPO, a novel gradient-based iterative policy optimization technique designed for efficient policy learning in autonomous systems, especially in the presence of modeling errors. Performance of control algorithms for autonomous systems is often limited by mismatches between a simplified nominal model and a complex real system. To address this degradation, HyGIPO leverages a hybrid gradient optimization approach, combining gradients of dynamics from a nominal model with real-world data to optimize control policies. We apply this method to the quadcopter waypoint tracking problem, with the controller parameterized by a neural network, demonstrating its effectiveness in both simulation and hardware experiments. In simulation, HyGIPO rapidly learns the policy within a hundred samples, showing orders of magnitude higher sample efficiency compared to reinforcement learning methods. The hardware experiments further validate the method, achieving successful tracking results in just tens of samples.

Original languageEnglish (US)
Title of host publication2025 American Control Conference, ACC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3035-3040
Number of pages6
ISBN (Electronic)9798331569372
DOIs
StatePublished - 2025
Event2025 American Control Conference, ACC 2025 - Denver, United States
Duration: Jul 8 2025Jul 10 2025

Publication series

NameProceedings of the American Control Conference
ISSN (Print)0743-1619

Conference

Conference2025 American Control Conference, ACC 2025
Country/TerritoryUnited States
CityDenver
Period7/8/257/10/25

Bibliographical note

Publisher Copyright:
© 2025 AACC.

Fingerprint

Dive into the research topics of 'Hybrid Gradient-Based Policy Optimization for Sample-Efficient Policy Learning in Autonomous Systems'. Together they form a unique fingerprint.

Cite this