Interference is a key performance challenge faced by cloud users, and can significantly degrade application performance on virtual machines (VMs). For load-balanced cloud applications, a key question is how to distribute the load among VMs in the presence of interference. Using a Markov decision process (MDP) model, we investigate dynamic control polices to assign jobs among a cluster of VMs that are prone to interference in a system with a central queue and an arbitrary number of VMs. We characterize the structural properties of the MDP optimality equation, and we prove that the optimal control policy is a threshold policy based on the queue length. The optimal policy is characterized by multiple thresholds depending on the current conditions of the VMs, including the number of busy under-interference VMs. We discuss the existence of an ordering among such thresholds, and we prove the ordering for a two-VM system. Our numerical results show that the optimal dynamic policy can significantly improve performance compared to the the commonly employed non-idling policy. For low utilization systems, we observe improvements on the order of around 20%. We further implement the optimal policy in a real-world testbed using the HAProxy load balancer, and show that it can reduce web server response times by as much as 40%-60%, even for time-varying request rates.
|Original language||English (US)|
|Title of host publication||Proceedings - 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2019|
|Publisher||IEEE Computer Society|
|Number of pages||14|
|State||Published - Oct 2019|
|Event||27th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2019 - Rennes, France|
Duration: Oct 22 2019 → Oct 25 2019
|Name||Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS|
|Conference||27th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2019|
|Period||10/22/19 → 10/25/19|
Bibliographical noteFunding Information:
ACKNOWLEDGMENT This work was supported by NSF CNS grants 1617046, 1717588, and 1750109.
© 2019 IEEE.
Copyright 2020 Elsevier B.V., All rights reserved.
- Cloud Computing
- Markov Chains
- Markov Decision Process
- Optimal Control of Queues