## Abstract

In the problem of population stratification, each data instance is generated based on a finite mixture model with K mixture components and L observed variables. Each variable takes its value in a finite state space with cardinality M. The variables are drawn independently in each mixture component. In this paper, we study the problem of the identifiability of parameters in this model, i.e. interpolation of the parameters of a mixture model from its mixture distribution. First we define the notion of informative variables. Then, we prove that the parameters of the problem are identifiable in the worst-case regime, if and only if the number of informative variables is greater than or equal to 2K - 1. As a result, in the worst-case analysis of the identifiability problem of finite mixture models, the number of required informative variables is Θ(K) and it is independent of the state space size.

Original language | English (US) |
---|---|

Title of host publication | 2018 IEEE International Symposium on Information Theory, ISIT 2018 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 1051-1055 |

Number of pages | 5 |

ISBN (Print) | 9781538647806 |

DOIs | |

State | Published - Aug 15 2018 |

Externally published | Yes |

Event | 2018 IEEE International Symposium on Information Theory, ISIT 2018 - Vail, United States Duration: Jun 17 2018 → Jun 22 2018 |

### Publication series

Name | IEEE International Symposium on Information Theory - Proceedings |
---|---|

Volume | 2018-June |

ISSN (Print) | 2157-8095 |

### Other

Other | 2018 IEEE International Symposium on Information Theory, ISIT 2018 |
---|---|

Country/Territory | United States |

City | Vail |

Period | 6/17/18 → 6/22/18 |

### Bibliographical note

Publisher Copyright:© 2018 IEEE.

## Keywords

- Identifiability
- Population genetics
- Population stratification