Motivation: The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium. Results: We have collected Saccharomyces cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each dataset's relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community.
|Original language||English (US)|
|Number of pages||8|
|State||Published - Oct 15 2007|
Bibliographical noteFunding Information:
The authors would like to thank the members of the Botstein, Kruglyak and Dunham laboratories for advice and input on the system. We also thank John Wiggins and Mark Schroeder for excellent technical support. O.G.T. is an Alfred P. Sloan Research Fellow. This research was partially supported by NSF grant CNS-0406415, NSF CAREER award DBI-0546275 to O.G.T., NIH grant R01 GM071966, NSF grant IIS-0513552, NIH grant T32 HG003284 and NIGMS Center of Excellence grant P50 GM071508 and partially supported by a Google Research Award.