Abstract
We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.
Original language | English (US) |
---|---|
Pages (from-to) | 113-127 |
Number of pages | 15 |
Journal | Journal of Official Statistics |
Volume | 32 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2016 |
Bibliographical note
Publisher Copyright:© Statistics Sweden.
Keywords
- Missing data
- Sample survey
- Synthetic samples