Distributed computing applications are increasingly utilizing distributed data sources. However, the unpredictable cost of data access in large-scale computing infrastructures can lead to severe performance bottlenecks. Providing predictability in data access is, thus, essential to accommodate the large set of newly emerging large-scale, data-intensive computing applications. In this regard, accurate estimation of network performance is crucial to meeting the performance goals of such applications. Passive estimation based on past measurements is attractive for its relatively small overhead compared to relying on explicit probing. In this paper, we take a passive approach for network performance estimation. Our approach is different from existing passive techniques that rely either on past direct measurements of pairs of nodes or on topological similarities. Instead, we exploit secondhand measurements collected by other nodes without any topological restrictions. In this paper, we present Overlay Passive Estimation of Network performance (OPEN), a scalable framework providing end-to-end network performance estimation based on secondhand measurements, and discuss how OPEN achieves cost-effective estimation in a large-scale infrastructure. Our extensive experimental results show that OPEN estimation can be applicable for replica and resource selections commonly used in distributed computing.
|Original language||English (US)|
|Number of pages||9|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|State||Published - 2011|
Bibliographical noteFunding Information:
The authors are grateful to the anonymous reviewers for their constructive comments. This work was supported in part by US National Science Foundation grant CNS-0643505 and IIS-0916425. Appendices, which can be found on the Computer Society Digital Library at http:// doi.ieeecomputersociety.org/10.1109/TPDS.2010.201, for additional details and extended experimental results are also available from Digital Library with the electronic version of the paper.
- Network performance estimation
- data-intensive computing
- replica selection
- resource selection
- secondhand estimation