Genome-Wide Association Studies: Information Theoretic Limits of Reliable Learning

Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Abolfazl S. Motahari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


In the problems of Genome-Wide Association Study (GWAS), the objective is to associate subsequences of individual's genomes to the observable characteristics called phenotypes. The genome containing the biological information of an individual can be represented by a sequence of length G. Many observable characteristics of the individuals can be related to a subsequence of a given length L, called causal subsequence. The environmental affects make the relation between the causal subsequence and the observable characteristics a stochastic function. Our objective in this paper is to detect the causal subsequence of a specific phenotype using a dataset of N individuals and their observed characteristics. We introduce an abstract formulation of GWAS which allows us to investigate the problem from an information theoretic perspective. In particular, as the parameters N, G, and L grow, we observe a threshold effect at frac Gh (L/G) N, where h(.) is the binary entropy function. This effect allows us to define the capacity of recovering the causal subsequence by denoting the rate of the GWAS problem as frac Gh(L/G) N. We develop an achievable scheme and a matching converse for this problem, and thus characterize its capacity in two scenarios: the zero-error-rate and the ϵ -error-rate.

Original languageEnglish (US)
Title of host publication2018 IEEE International Symposium on Information Theory, ISIT 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Print)9781538647806
StatePublished - Aug 15 2018
Externally publishedYes
Event2018 IEEE International Symposium on Information Theory, ISIT 2018 - Vail, United States
Duration: Jun 17 2018Jun 22 2018

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095


Other2018 IEEE International Symposium on Information Theory, ISIT 2018
Country/TerritoryUnited States

Bibliographical note

Publisher Copyright:
© 2018 IEEE.


  • DNA sequencing
  • Genome-wide association studies
  • Threshold effect


Dive into the research topics of 'Genome-Wide Association Studies: Information Theoretic Limits of Reliable Learning'. Together they form a unique fingerprint.

Cite this