TY - JOUR
T1 - Comparison of Profile Similarity Measures for Genetic Interaction Networks
AU - Deshpande, Raamesh
AU - VanderSluis, Benjamin
AU - Myers, Chad L.
PY - 2013/7/10
Y1 - 2013/7/10
N2 - Analysis of genetic interaction networks often involves identifying genes with similar profiles, which is typically indicative of a common function. While several profile similarity measures have been applied in this context, they have never been systematically benchmarked. We compared a diverse set of correlation measures, including measures commonly used by the genetic interaction community as well as several other candidate measures, by assessing their utility in extracting functional information from genetic interaction data. We find that the dot product, one of the simplest vector operations, outperforms most other measures over a large range of gene pairs. More generally, linear similarity measures such as the dot product, Pearson correlation or cosine similarity perform better than set overlap measures such as Jaccard coefficient. Similarity measures that involve L2-normalization of the profiles tend to perform better for the top-most similar pairs but perform less favorably when a larger set of gene pairs is considered or when the genetic interaction data is thresholded. Such measures are also less robust to the presence of noise and batch effects in the genetic interaction data. Overall, the dot product measure performs consistently among the best measures under a variety of different conditions and genetic interaction datasets.
AB - Analysis of genetic interaction networks often involves identifying genes with similar profiles, which is typically indicative of a common function. While several profile similarity measures have been applied in this context, they have never been systematically benchmarked. We compared a diverse set of correlation measures, including measures commonly used by the genetic interaction community as well as several other candidate measures, by assessing their utility in extracting functional information from genetic interaction data. We find that the dot product, one of the simplest vector operations, outperforms most other measures over a large range of gene pairs. More generally, linear similarity measures such as the dot product, Pearson correlation or cosine similarity perform better than set overlap measures such as Jaccard coefficient. Similarity measures that involve L2-normalization of the profiles tend to perform better for the top-most similar pairs but perform less favorably when a larger set of gene pairs is considered or when the genetic interaction data is thresholded. Such measures are also less robust to the presence of noise and batch effects in the genetic interaction data. Overall, the dot product measure performs consistently among the best measures under a variety of different conditions and genetic interaction datasets.
UR - http://www.scopus.com/inward/record.url?scp=84880021782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880021782&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0068664
DO - 10.1371/journal.pone.0068664
M3 - Article
C2 - 23874711
AN - SCOPUS:84880021782
SN - 1932-6203
VL - 8
JO - PloS one
JF - PloS one
IS - 7
M1 - e68664
ER -