Skip to main navigation Skip to search Skip to main content

Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

  • Yassine Chemingui
  • , Aryan Deshwal
  • , Honghao Wei
  • , Alan Fern
  • , Jana Doppa

Research output: Contribution to journalConference articlepeer-review

Abstract

Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL.

Original languageEnglish (US)
Pages (from-to)15722-15730
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number15
DOIs
StatePublished - Apr 11 2025
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: Feb 25 2025Mar 4 2025

Bibliographical note

Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Fingerprint

Dive into the research topics of 'Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning'. Together they form a unique fingerprint.

Cite this