Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes by their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (> 400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element-1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, was used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes.
Bibliographical noteFunding Information:
C.D.E. was supported by a UCLA-IGERT bioinformatics traineeship (NSF DGE-9987641). M.R. was supported by a Tumor Cell Biology Fellowship (USHHS Institutional National Research Service Award #T32 CA09056). Y.M. was supported in part by National Institutes of Health Grants GM6100701 and HD041451-02.
- Random forest
- Tissue-specific genes