TY - JOUR
T1 - To clean or not to clean
T2 - Malware removal strategies for servers under load
AU - Doroudi, Sherwin
AU - Avgerinos, Thanassis
AU - Harchol-Balter, Mor
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2021/7/16
Y1 - 2021/7/16
N2 - We consider how to best schedule reparative downtime for a customer-facing online service that is vulnerable to cyber attacks such as malware infections. These infections can cause performance degradation (i.e., a slower service rate) and facilitate data theft, both of which have monetary repercussions. Infections may go undetected and can only be removed by time-consuming cleanup procedures, which require temporarily taking the service offline. From a security-oriented perspective, cleanups should be undertaken as frequently as possible. From a performance-oriented perspective, frequent cleanups are desirable because they maintain faster service, but they are simultaneously undesirable because they lead to more frequent downtimes and subsequent loss of revenue. We ask when and how often cleanups should happen. In order to analyze various downtime scheduling policies, we combine queueing-theoretic techniques with a revenue model to capture the problem's tradeoffs. Unlike classical repair problems, this problem necessitates the analysis of a quasi-birth-death Markov chain, tracking the number of customer requests in the system and the (possibly unknown) infection state. We adapt a recent analytic technique, Clearing Analysis on Phases (CAP), to determine the exact steady-state distribution of the underlying Markov chain, which we then use to compute revenue rates and make recommendations. Prior work on downtime scheduling under cyber attacks relies on heuristic approaches, with our work being the first to address this problem analytically.
AB - We consider how to best schedule reparative downtime for a customer-facing online service that is vulnerable to cyber attacks such as malware infections. These infections can cause performance degradation (i.e., a slower service rate) and facilitate data theft, both of which have monetary repercussions. Infections may go undetected and can only be removed by time-consuming cleanup procedures, which require temporarily taking the service offline. From a security-oriented perspective, cleanups should be undertaken as frequently as possible. From a performance-oriented perspective, frequent cleanups are desirable because they maintain faster service, but they are simultaneously undesirable because they lead to more frequent downtimes and subsequent loss of revenue. We ask when and how often cleanups should happen. In order to analyze various downtime scheduling policies, we combine queueing-theoretic techniques with a revenue model to capture the problem's tradeoffs. Unlike classical repair problems, this problem necessitates the analysis of a quasi-birth-death Markov chain, tracking the number of customer requests in the system and the (possibly unknown) infection state. We adapt a recent analytic technique, Clearing Analysis on Phases (CAP), to determine the exact steady-state distribution of the underlying Markov chain, which we then use to compute revenue rates and make recommendations. Prior work on downtime scheduling under cyber attacks relies on heuristic approaches, with our work being the first to address this problem analytically.
KW - Computer security
KW - Maintenance
KW - Malware
KW - Markov processes
KW - Queueing
UR - http://www.scopus.com/inward/record.url?scp=85100575185&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100575185&partnerID=8YFLogxK
U2 - 10.1016/j.ejor.2020.10.036
DO - 10.1016/j.ejor.2020.10.036
M3 - Article
AN - SCOPUS:85100575185
SN - 0377-2217
VL - 292
SP - 596
EP - 609
JO - European Journal of Operational Research
JF - European Journal of Operational Research
IS - 2
ER -