Abstract
We present here a system architecture and its underlying mechanisms for building autonomically scalable and resilient services on cooperatively shared computing platforms. Specifically, our focus is on utilizing computing platforms exhibiting the following characteristics. The resources at a node in such platforms are allocated to competing users on fair-share basis, without any reserved resource capacities for any user. There is no platform-wide resource manager for the placement of users on different nodes. The users independently select nodes for their applications. Moreover, a node can become unavailable at any time due to crashes or shutdowns. Building scalable services in such environments poses unique challenges due to node-level fluctuations in the available resource capacities and node crashes. The service load may surge in a short time due to flash crowds. Autonomic scaling of service capacity is performed by dynamic control of the degree of service replication based on the estimated service capacity and the observed load. We present here models for estimating the service capacity at a node under fluctuating operating conditions. Furthermore, we develop adaptive and agile load distribution mechanisms for distributing load among replicas based on their time-varying service capacities. We present the results of our evaluations of these mechanisms on PlanetLab, which exemplifies the platform level characteristics considered here.
Original language | English (US) |
---|---|
Pages (from-to) | 1251-1276 |
Number of pages | 26 |
Journal | Software - Practice and Experience |
Volume | 44 |
Issue number | 10 |
DOIs | |
State | Published - Oct 2014 |
Bibliographical note
Publisher Copyright:Copyright © 2013 John Wiley & Sons, Ltd.
Keywords
- Autonomic systems
- Distributed systems
- Load balancing
- Replication management
- Resilient services