Information theory of mixed population genome-wide association studies

Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Genome-Wide Association Study (GWAS) addresses the problem of associating subsequences of individuals' genomes to the observable characteristics called phenotypes. In a genome of length G, it is observed that each characteristic is only related to a specific subsequence of it with length L, called the causal subsequence. The objective is to recover the causal subsequence, using a dataset of N individuals' genomes and their observed characteristics. Recently, the problem has been investigated from an information theoretic point of view in [1]. It has been shown that there is a threshold effect for reliable learning of the causal subsequence at Gh(NL/G) by characterizing the capacity of it. Here h(.) denotes the binary entropy function. However, it is assumed that the dataset is collected from one population and the problem of mixed population datasets is not considered in [1], which is observed in many practical settings. In this paper, we study the mixed population version of GWAS, where we assume that the dataset is gathered from K subpopulations, rather than one. Each subpopulation has a specific causal subsequence for the observed characteristic and the subpopulation origins of individuals are latent. The objective is to recover all the causal subsequences with high accuracy. We investigate the fundamental limits of mixed population GWAS and characterize its capacity. It is observed that for a special class of two subpopulations, the capacity is one-fourth of the capacity of unmixed population case with the same parameters. Also, the capacity of this problem has connections to the capacity region of the Multiple Access Channel (MAC).

Original languageEnglish (US)
Title of host publication2018 IEEE Information Theory Workshop, ITW 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538635995
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event2018 IEEE Information Theory Workshop, ITW 2018 - Guangzhou, China
Duration: Nov 25 2018Nov 29 2018

Publication series

Name2018 IEEE Information Theory Workshop, ITW 2018

Conference

Conference2018 IEEE Information Theory Workshop, ITW 2018
Country/TerritoryChina
CityGuangzhou
Period11/25/1811/29/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE Information Theory Workshop, ITW 2018. All rights reserved.

Keywords

  • DNA sequencing
  • Genome-wide association studies
  • Multiple access channel
  • Threshold effect

Fingerprint

Dive into the research topics of 'Information theory of mixed population genome-wide association studies'. Together they form a unique fingerprint.

Cite this