Predicting Antigenic Distance from Genetic Data for PRRSV-Type 1: Applications of Machine Learning

Dennis N. Makau, Cinta Prieto, Francisco J. Martínez-Lobo, I. A.D. Paploski, Kimberly VanderWaal

Research output: Contribution to journalArticlepeer-review


The control of porcine reproductive and respiratory syndrome (PRRS) remains a significant challenge due to the genetic and antigenic variability of the causative virus (PRRSV). Predominantly, PRRSV management includes using vaccines and live virus inoculations to confer immunity against PRRSV on farms. While understanding cross-protection among strains is crucial for the continued success of these interventions, understanding how genetic diversity translates to antigenic diversity remains elusive. We developed machine learning algorithms to estimate antigenic distance in silico, based on genetic sequence data, and identify differences in specific amino acid sites associated with antigenic differences between viruses. First, we obtained antigenic distance estimates derived from serum neutralization assays cross-reacting PRRSV mono-specific antisera with virus isolates from 27 PRRSV1 viruses circulating in Europe. Antigenic distances were weakly to moderately associated with ectodomain amino acid distance for open reading frames (ORFs) 2 to 4 (r , 0.2) and ORF5 (r = 0.3), respectively. Dividing the antigenic distance values at the median, we then categorized the sera-virus pairs into two levels: low and high antigenic distance (dissimilarity). In the machine learning models, we used amino acid distances in the ectodomains of ORFs 2 to 5 and site-wise amino acid differences between the viruses as potential predictors of antigenic dissimilarity. Using mixed-effect gradient boosting models, we estimated the antigenic distance (high versus low) between serum-virus pairs with an accuracy of 81% (95% confidence interval, 76 to 85%); sensitivity and specificity were 86% and 75%, respectively. We demonstrate that using sequence data we can estimate antigenic distance and potential cross-protection between PRRSV1 strains. IMPORTANCE Understanding cross-protection between cocirculating PRRSV1 strains is crucial to reducing losses associated with PRRS outbreaks on farms. While experimental studies to determine cross-protection are instrumental, these in vivo studies are not always practical or timely for the many cocirculating and emerging PRRSV strains. In this study, we demonstrate the ability to rapidly estimate potential immunologic cross-reaction between different PRRSV1 strains in silico using sequence data routinely collected by production systems. These models can provide fast turn-around information crucial for improving PRRS management decisions such as selecting vaccines/live virus inoculation to be used on farms and assessing the risk of outbreaks by emerging strains on farms previously exposed to certain PRRSV strains and vaccine development among others.

Original languageEnglish (US)
JournalMicrobiology Spectrum
Issue number1
StatePublished - Jan 2023

Bibliographical note

Funding Information:
We thank all personnel involved in the generation of these data. This project was supported by the USDA National Institute of Food and Agriculture (NIFA), the joint NIFA-NSF-NIH-BBSRC Ecology and Evolution of Infectious Disease award 2019-67015-29918 and BB/T004401/1 and by the USDA NIFA Critical Agricultural Research and Extension program 2022-68008-37146. We declare there are no conflicts of interest.

Publisher Copyright:
Copyright © 2022 Makau et al.


  • bioinformatics
  • cross-protection
  • immune response
  • immunodominant sites
  • immunogenicity
  • machine learning
  • seroneutralization

PubMed: MeSH publication types

  • Journal Article
  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.


Dive into the research topics of 'Predicting Antigenic Distance from Genetic Data for PRRSV-Type 1: Applications of Machine Learning'. Together they form a unique fingerprint.

Cite this