This paper deals with online convex optimization involving both time-varying loss functions, and time-varying constraints. The loss functions are not fully accessible to the learner, and instead only the function values (also known as bandit feedback) are revealed at queried points. The constraints are revealed after making decisions, and can be instantaneously violated, yet they must be satisfied in the long term. This setting fits nicely the emerging online network tasks such as fog computing in the Internet-of-Things, where online decisions must flexibly adapt to the changing user preferences (loss functions), and the temporally unpredictable availability of resources (constraints). Tailored for such human-in-the-loop systems where the loss functions are hard to model, a family of online bandit saddle-point (BanSaP) schemes are developed, which adaptively adjust the online operations based on (possibly multiple) bandit feedback of the loss functions, and the changing environment. Performance here is assessed by: 1) dynamic regret that generalizes the widely used static regret and 2) fit that captures the accumulated amount of constraint violations. Specifically, BanSaP is proved to simultaneously yield sublinear dynamic regret and fit, provided that the best dynamic solutions vary slowly over time. Numerical tests in fog computation offloading tasks corroborate that our proposed BanSaP approach offers competitive performance relative to existing approaches that are based on gradient feedback.
Bibliographical noteFunding Information:
Manuscript received July 17, 2017; revised December 23, 2017; accepted May 4, 2018. Date of publication May 22, 2018; date of current version February 25, 2019. This work was supported by NSF under Grant 1509040, Grant 1508993, and Grant 1711471. This paper was presented in part at the International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, April 2018 . (Corresponding author: Georgios B. Giannakis.) The authors are with the Department of Electrical and Computer Engineering and the Digital Technology Center, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: email@example.com; firstname.lastname@example.org). Digital Object Identifier 10.1109/JIOT.2018.2839563
- Bandit convex optimization (BCO)
- Internet of Things (IoT)
- mobile edge computing
- online learning
- saddle-point method