TY - GEN
T1 - Prediction of HLA genes from SNP data and HLA haplotype frequencies
AU - Paunić, Vanja
AU - Steinbach, Michael S
AU - Kumar, Vipin
AU - Maiers, Martin
PY - 2012
Y1 - 2012
N2 - Variation in the Human Leukocyte Antigen (HLA) gene system is very important. It is one of the most polymorphic regions of the human genome and one of the most extensively studied regions due to its association with autoimmune, infectious, and inflammatory diseases, such as rheumatoid arthritis, celiac disease, multiple sclerosis and Type I diabetes. The HLA gene system also plays a crucial role in hematopoietic stem cell transplantation, where patients and donors are matched with respect to their HLA genes in order to maximize the chances of a successful transplant. Having complete HLA data is therefore of great use to clinicians and researchers. However, due to its polymorphism, obtaining it is highly time- and cost-prohibitive. Genome-wide association studies finding strong associations within HLA region would ideally like to identify the exact HLA alleles responsible for association in order to determine the causal genes/variants. Here we propose a method to infer HLA alleles from widely available and affordable SNP genotype data. Our method takes into account the high linkage disequilibrium that exists in the region. We demonstrate that this additional information is an imporant asset in HLA prediction problem.
AB - Variation in the Human Leukocyte Antigen (HLA) gene system is very important. It is one of the most polymorphic regions of the human genome and one of the most extensively studied regions due to its association with autoimmune, infectious, and inflammatory diseases, such as rheumatoid arthritis, celiac disease, multiple sclerosis and Type I diabetes. The HLA gene system also plays a crucial role in hematopoietic stem cell transplantation, where patients and donors are matched with respect to their HLA genes in order to maximize the chances of a successful transplant. Having complete HLA data is therefore of great use to clinicians and researchers. However, due to its polymorphism, obtaining it is highly time- and cost-prohibitive. Genome-wide association studies finding strong associations within HLA region would ideally like to identify the exact HLA alleles responsible for association in order to determine the causal genes/variants. Here we propose a method to infer HLA alleles from widely available and affordable SNP genotype data. Our method takes into account the high linkage disequilibrium that exists in the region. We demonstrate that this additional information is an imporant asset in HLA prediction problem.
KW - HLA imputation
KW - Human Leukocyte Antigen
KW - Multi-label prediction
KW - SNP data
UR - http://www.scopus.com/inward/record.url?scp=84873167638&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84873167638&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2012.74
DO - 10.1109/ICDMW.2012.74
M3 - Conference contribution
AN - SCOPUS:84873167638
SN - 9780769549255
T3 - Proceedings - 12th IEEE International Conference on Data Mining Workshops, ICDMW 2012
SP - 964
EP - 971
BT - Proceedings - 12th IEEE International Conference on Data Mining Workshops, ICDMW 2012
T2 - 12th IEEE International Conference on Data Mining Workshops, ICDMW 2012
Y2 - 10 December 2012 through 10 December 2012
ER -