The unreasonable effectiveness of convolutional neural networks in population genetic inference

Lex Flagel, Yaniv J Brandvain, Daniel R. Schrider

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.

Original languageEnglish (US)
Pages (from-to)220-238
Number of pages19
JournalMolecular biology and evolution
Volume36
Issue number2
DOIs
StatePublished - Jan 1 2019

Fingerprint

Population Genetics
neural networks
population genetics
genomics
Metagenomics
Gene Flow
Sequence Alignment
sequence alignment
Population Density
Gene Frequency
Genetic Recombination
recombination
gene flow
gene frequency
population size
allele
statistical analysis
demographic statistics
learning
researchers

Keywords

  • Demographic inference
  • Introgression
  • Machine learning
  • Population genetics
  • Recombination
  • Selective sweeps

Cite this

The unreasonable effectiveness of convolutional neural networks in population genetic inference. / Flagel, Lex; Brandvain, Yaniv J; Schrider, Daniel R.

In: Molecular biology and evolution, Vol. 36, No. 2, 01.01.2019, p. 220-238.

Research output: Contribution to journalArticle

@article{c4c7bcd246134952aa4cd4e4df2c5315,
title = "The unreasonable effectiveness of convolutional neural networks in population genetic inference",
abstract = "Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.",
keywords = "Demographic inference, Introgression, Machine learning, Population genetics, Recombination, Selective sweeps",
author = "Lex Flagel and Brandvain, {Yaniv J} and Schrider, {Daniel R.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1093/molbev/msy224",
language = "English (US)",
volume = "36",
pages = "220--238",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - The unreasonable effectiveness of convolutional neural networks in population genetic inference

AU - Flagel, Lex

AU - Brandvain, Yaniv J

AU - Schrider, Daniel R.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.

AB - Population-scale genomic data sets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date, most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g., only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here, we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNNs are capable of outperforming expert-derived statistical methods and offer a new path forward in cases where no likelihood approach exists.

KW - Demographic inference

KW - Introgression

KW - Machine learning

KW - Population genetics

KW - Recombination

KW - Selective sweeps

UR - http://www.scopus.com/inward/record.url?scp=85061503500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061503500&partnerID=8YFLogxK

U2 - 10.1093/molbev/msy224

DO - 10.1093/molbev/msy224

M3 - Article

VL - 36

SP - 220

EP - 238

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 2

ER -