TY - GEN
T1 - Measuring the heterogeneity of cross-company dataset
AU - Chen, Jia
AU - Yang, Ye
AU - Zhang, Wen
AU - Gay, Gregory
PY - 2010
Y1 - 2010
N2 - As a standard practice, general effort estimate models are calibrated from large cross-company datasets. However, many of the records within such datasets are taken from companies that have calibrated the model to match their own local practices. Locally calibrated models are a double-edged sword; they often improve estimate accuracy for that particular organization, but they also encourage the growth of local biases. Such biases remain present when projects from that firm are used in a new cross-company dataset. Over time, such biases compound, and the reliability and accuracy of a general model derived from the data will be affected by the increased level of heterogeneity. In this paper, we propose a statistical measure of the exact level of heterogeneity of a cross-company dataset. In experimental tests, we measure the heterogeneity of two COCOMO-based datasets and demonstrate that one is more homogeneous than the other. Such a measure has potentially important implications for both model maintainers and model users. Furthermore, a heterogeneity measure can be used to inform users of the appropriate data handling techniques.
AB - As a standard practice, general effort estimate models are calibrated from large cross-company datasets. However, many of the records within such datasets are taken from companies that have calibrated the model to match their own local practices. Locally calibrated models are a double-edged sword; they often improve estimate accuracy for that particular organization, but they also encourage the growth of local biases. Such biases remain present when projects from that firm are used in a new cross-company dataset. Over time, such biases compound, and the reliability and accuracy of a general model derived from the data will be affected by the increased level of heterogeneity. In this paper, we propose a statistical measure of the exact level of heterogeneity of a cross-company dataset. In experimental tests, we measure the heterogeneity of two COCOMO-based datasets and demonstrate that one is more homogeneous than the other. Such a measure has potentially important implications for both model maintainers and model users. Furthermore, a heterogeneity measure can be used to inform users of the appropriate data handling techniques.
KW - estimation model calibration
KW - heterogeneous datasets
KW - parameter comparison
KW - software effort estimation
UR - http://www.scopus.com/inward/record.url?scp=80053208950&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053208950&partnerID=8YFLogxK
U2 - 10.1145/1961258.1961272
DO - 10.1145/1961258.1961272
M3 - Conference contribution
AN - SCOPUS:80053208950
SN - 9781450302814
T3 - ACM International Conference Proceeding Series
SP - 55
EP - 58
BT - 11th International Conference on Product Focused Software Development and Process Improvement, PROFES 2010
T2 - 11th International Conference on Product Focused Software Development and Process Improvement, PROFES 2010
Y2 - 21 June 2010 through 23 June 2010
ER -