TY - GEN
T1 - A provider-side view of web search response time
AU - Chen, Yingying
AU - Mahajan, Ratul
AU - Sridharan, Baskar
AU - Zhang, Zhi Li
PY - 2013
Y1 - 2013
N2 - Using a large Web search service as a case study, we highlight the challenges that modern Web services face in understanding and diagnosing the response time experienced by users. We show that search response time (SRT) varies widely over time and also exhibits counter-intuitive behavior. It is actually higher during off-peak hours, when the query load is lower, than during peak hours. To resolve this paradox and explain SRT variations in general, we develop an analysis framework that separates systemic variations due to periodic changes in service usage and anomalous variations due to unanticipated events such as failures and denial-of-service attacks. We find that systemic SRT variations are primarily caused by systemic changes in aggregate network characteristics, nature of user queries, and browser types. For instance, one reason for higher SRTs during off-peak hours is that during those hours a greater fraction of queries come from slower, mainly-residential networks. We also develop a technique that, by factoring out the impact of such variations, robustly detects and diagnoses performance anomalies in SRT. Deployment experience shows that our technique detects three times more true (operator-verified) anomalies than existing techniques.
AB - Using a large Web search service as a case study, we highlight the challenges that modern Web services face in understanding and diagnosing the response time experienced by users. We show that search response time (SRT) varies widely over time and also exhibits counter-intuitive behavior. It is actually higher during off-peak hours, when the query load is lower, than during peak hours. To resolve this paradox and explain SRT variations in general, we develop an analysis framework that separates systemic variations due to periodic changes in service usage and anomalous variations due to unanticipated events such as failures and denial-of-service attacks. We find that systemic SRT variations are primarily caused by systemic changes in aggregate network characteristics, nature of user queries, and browser types. For instance, one reason for higher SRTs during off-peak hours is that during those hours a greater fraction of queries come from slower, mainly-residential networks. We also develop a technique that, by factoring out the impact of such variations, robustly detects and diagnoses performance anomalies in SRT. Deployment experience shows that our technique detects three times more true (operator-verified) anomalies than existing techniques.
KW - anomaly detection and diagnosis
KW - performance monitoring
KW - search response time
KW - web services
UR - http://www.scopus.com/inward/record.url?scp=84891593576&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84891593576&partnerID=8YFLogxK
U2 - 10.1145/2534169.2486035
DO - 10.1145/2534169.2486035
M3 - Conference contribution
AN - SCOPUS:84891593576
SN - 9781450320566
T3 - Computer Communication Review
SP - 243
EP - 254
BT - Proceedings of the SIGCOMM 2013 and Best Papers of the Co-Located Workshops
T2 - Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM SIGCOMM 2013
Y2 - 12 August 2013 through 16 August 2013
ER -