Abstract
The Human Leukocyte Antigen (HLA) genes are some of the most studied genes on the genome. This is due to their importance in bone marrow and solid organ transplantation, as well as their strong associations with many autoimmune, infectious, and inammatory diseases. As such, they can be a highly valuable asset to clinicians and researchers for elucidating biological mechanism that may drive those diseases. The extraordinary genetic polymorphism that exists in this region makes it very challenging to type. Therefore, several approaches were proposed for prediction of HLA genes from widely available genome-wide single nucleotide polymorphism (SNP) data sets in the attempt to reduce cost and utilize existing data. These methods use SNPs and highresolution training HLA data to build models for prediction of HLA genes in new samples. However, most of the existing HLA data sets are not available in high-resolution (exact allele assignment) but contain allelic ambiguities (inexact allele assignments). This is a result of existing typing methodologies not always being able to distinguish between several possible alleles at a given gene and produce ambiguous allele as a result. Current approaches for prediction of HLA genes from SNP data do not accommodate learning from ambiguous HLA data and, as such, miss the potential for an increased sample size and consequently improvements in prediction performance. In this paper, we propose Amb-EM, a novel algorithm for SNP-based prediction of HLA genes that utilizes ambiguities in the HLA data and predicts highresolution alleles using ambiguous HLA alleles for building the model. Additionally, we measure the impact that the uncertainty in the training data has on the prediction accuracy, and evaluate it on a real world data set. Our results show that the prediction from ambiguous HLA data outperforms the alternative approach which first imputes the ambiguous data into high-resolution HLA alleles and uses it to build the model.
Original language | English (US) |
---|---|
Title of host publication | ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics |
Publisher | Association for Computing Machinery |
Pages | 104-113 |
Number of pages | 10 |
ISBN (Electronic) | 9781450328944 |
DOIs | |
State | Published - Sep 20 2014 |
Event | 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014 - Newport Beach, United States Duration: Sep 20 2014 → Sep 23 2014 |
Publication series
Name | ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics |
---|
Other
Other | 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014 |
---|---|
Country/Territory | United States |
City | Newport Beach |
Period | 9/20/14 → 9/23/14 |
Bibliographical note
Publisher Copyright:Copyright © 2014 ACM.
Keywords
- Ambiguous genotypes
- Expectation-maximization
- HLA prediction
- SNPs
- Uncertain data