Measuring the heterogeneity of cross-company dataset

Jia Chen, Ye Yang, Wen Zhang, Gregory Gay

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

As a standard practice, general effort estimate models are calibrated from large cross-company datasets. However, many of the records within such datasets are taken from companies that have calibrated the model to match their own local practices. Locally calibrated models are a double-edged sword; they often improve estimate accuracy for that particular organization, but they also encourage the growth of local biases. Such biases remain present when projects from that firm are used in a new cross-company dataset. Over time, such biases compound, and the reliability and accuracy of a general model derived from the data will be affected by the increased level of heterogeneity. In this paper, we propose a statistical measure of the exact level of heterogeneity of a cross-company dataset. In experimental tests, we measure the heterogeneity of two COCOMO-based datasets and demonstrate that one is more homogeneous than the other. Such a measure has potentially important implications for both model maintainers and model users. Furthermore, a heterogeneity measure can be used to inform users of the appropriate data handling techniques.

Original languageEnglish (US)
Title of host publication11th International Conference on Product Focused Software Development and Process Improvement, PROFES 2010
Pages55-58
Number of pages4
DOIs
StatePublished - 2010
Event11th International Conference on Product Focused Software Development and Process Improvement, PROFES 2010 - Limerick, Ireland
Duration: Jun 21 2010Jun 23 2010

Publication series

NameACM International Conference Proceeding Series

Other

Other11th International Conference on Product Focused Software Development and Process Improvement, PROFES 2010
CountryIreland
CityLimerick
Period6/21/106/23/10

Keywords

  • estimation model calibration
  • heterogeneous datasets
  • parameter comparison
  • software effort estimation

Fingerprint Dive into the research topics of 'Measuring the heterogeneity of cross-company dataset'. Together they form a unique fingerprint.

Cite this