Abstract
Genome-Wide Association Study (GWAS) addresses the problem of associating subsequences of individuals' genomes to the observable characteristics called phenotypes. In a genome of length G, it is observed that each characteristic is only related to a specific subsequence of it with length L, called the causal subsequence. The objective is to recover the causal subsequence, using a dataset of N individuals' genomes and their observed characteristics. Recently, the problem has been investigated from an information theoretic point of view in [1]. It has been shown that there is a threshold effect for reliable learning of the causal subsequence at Gh(NL/G) by characterizing the capacity of it. Here h(.) denotes the binary entropy function. However, it is assumed that the dataset is collected from one population and the problem of mixed population datasets is not considered in [1], which is observed in many practical settings. In this paper, we study the mixed population version of GWAS, where we assume that the dataset is gathered from K subpopulations, rather than one. Each subpopulation has a specific causal subsequence for the observed characteristic and the subpopulation origins of individuals are latent. The objective is to recover all the causal subsequences with high accuracy. We investigate the fundamental limits of mixed population GWAS and characterize its capacity. It is observed that for a special class of two subpopulations, the capacity is one-fourth of the capacity of unmixed population case with the same parameters. Also, the capacity of this problem has connections to the capacity region of the Multiple Access Channel (MAC).
Original language | English (US) |
---|---|
Title of host publication | 2018 IEEE Information Theory Workshop, ITW 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781538635995 |
DOIs | |
State | Published - Jul 2 2018 |
Externally published | Yes |
Event | 2018 IEEE Information Theory Workshop, ITW 2018 - Guangzhou, China Duration: Nov 25 2018 → Nov 29 2018 |
Publication series
Name | 2018 IEEE Information Theory Workshop, ITW 2018 |
---|
Conference
Conference | 2018 IEEE Information Theory Workshop, ITW 2018 |
---|---|
Country/Territory | China |
City | Guangzhou |
Period | 11/25/18 → 11/29/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE Information Theory Workshop, ITW 2018. All rights reserved.
Keywords
- DNA sequencing
- Genome-wide association studies
- Multiple access channel
- Threshold effect