Inferring applications at the network layer using collective traffic statistics

Yu Jin, Nick Duffield, Patrick Haffner, Subhabrata Sen, Zhi Li Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

Operating, managing and securing networks require a thorough understanding of the demands placed on the network by the endpoints it interconnects, the characteristics of the traffic the endpoints generate, and the distribution of that traffic over the resources of the network infrastructure. A major differentiator in the types of resource required by traffic is the class of endpoint application that generates it. Service providers determine the application mix present in traffic via measurements, e.g., flow measurements furnished by routers. Previous work has shown that a fairly accurate determination of application type can be made from this data. However, protocol level information, such as TCP/UDP ports and other parts of the transport header, and also parts of the network header in some cases, may not be accessible due to the use of encryption or tunneling protocols by endpoints or gateways. Furthermore, the utility of ports as signifiers of application type has some limitations due to abuseand non-standard usage, amongst other reasons. These factors reduce the classification accuracy. In this paper, we propose a novel technique for inferring the distribution of application classes present in the aggregated traffic flows between endpoints, that exploits both the measured statistics of the traffic flows, and the spatial distribution of those flows across the network. Our method employs a two-step supervised model, where the bootstrapping step provides initial (inaccurate) inference on the traffic application classes, and the graph-based calibration step adjusts the initial inference through the collective spatial traffic distribution. In evaluations using real traffic flow measurements from a large ISP, we show how our method can accurately classify application types within aggregate traffic between endpoints, even without knowledge of ports and other traffic features. While the bootstrap estimate classifies the aggregates with 80% accuracy, incorporating spatial distributions through calibration increases the accuracy to 92%, i.e., roughly halving the number of errors.

Original languageEnglish (US)
Title of host publication2010 22nd International Teletraffic Congress - Proceedings, ITC 22
DOIs
StatePublished - 2010
Event2010 22nd International Teletraffic Congress, ITC 22 - Amsterdam, Netherlands
Duration: Sep 7 2010Sep 9 2010

Publication series

Name2010 22nd International Teletraffic Congress - Proceedings, ITC 22

Other

Other2010 22nd International Teletraffic Congress, ITC 22
Country/TerritoryNetherlands
CityAmsterdam
Period9/7/109/9/10

Fingerprint

Dive into the research topics of 'Inferring applications at the network layer using collective traffic statistics'. Together they form a unique fingerprint.

Cite this