Amb-EM: A SNP-based prediction of HLA alleles using ambiguous HLA Data

Vanja Paunić, Michael Steinbach, Abeer Madbouly, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Human Leukocyte Antigen (HLA) genes are some of the most studied genes on the genome. This is due to their importance in bone marrow and solid organ transplantation, as well as their strong associations with many autoimmune, infectious, and inammatory diseases. As such, they can be a highly valuable asset to clinicians and researchers for elucidating biological mechanism that may drive those diseases. The extraordinary genetic polymorphism that exists in this region makes it very challenging to type. Therefore, several approaches were proposed for prediction of HLA genes from widely available genome-wide single nucleotide polymorphism (SNP) data sets in the attempt to reduce cost and utilize existing data. These methods use SNPs and highresolution training HLA data to build models for prediction of HLA genes in new samples. However, most of the existing HLA data sets are not available in high-resolution (exact allele assignment) but contain allelic ambiguities (inexact allele assignments). This is a result of existing typing methodologies not always being able to distinguish between several possible alleles at a given gene and produce ambiguous allele as a result. Current approaches for prediction of HLA genes from SNP data do not accommodate learning from ambiguous HLA data and, as such, miss the potential for an increased sample size and consequently improvements in prediction performance. In this paper, we propose Amb-EM, a novel algorithm for SNP-based prediction of HLA genes that utilizes ambiguities in the HLA data and predicts highresolution alleles using ambiguous HLA alleles for building the model. Additionally, we measure the impact that the uncertainty in the training data has on the prediction accuracy, and evaluate it on a real world data set. Our results show that the prediction from ambiguous HLA data outperforms the alternative approach which first imputes the ambiguous data into high-resolution HLA alleles and uses it to build the model.

Original languageEnglish (US)
Title of host publicationACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery
Pages104-113
Number of pages10
ISBN (Electronic)9781450328944
DOIs
StatePublished - Sep 20 2014
Event5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014 - Newport Beach, United States
Duration: Sep 20 2014Sep 23 2014

Publication series

NameACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Other

Other5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Country/TerritoryUnited States
CityNewport Beach
Period9/20/149/23/14

Bibliographical note

Publisher Copyright:
Copyright © 2014 ACM.

Keywords

  • Ambiguous genotypes
  • Expectation-maximization
  • HLA prediction
  • SNPs
  • Uncertain data

Fingerprint

Dive into the research topics of 'Amb-EM: A SNP-based prediction of HLA alleles using ambiguous HLA Data'. Together they form a unique fingerprint.

Cite this