### Abstract

We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

Original language | English (US) |
---|---|

Pages (from-to) | 113-127 |

Number of pages | 15 |

Journal | Journal of Official Statistics |

Volume | 32 |

Issue number | 1 |

DOIs | |

State | Published - Mar 2016 |

### Fingerprint

### Keywords

- Missing data
- Sample survey
- Synthetic samples

### Cite this

*Journal of Official Statistics*,

*32*(1), 113-127. https://doi.org/10.1515/JOS-2016-0005

**Constructing synthetic samples.** / Dong, Hua; Meeden, Glen.

Research output: Contribution to journal › Article

*Journal of Official Statistics*, vol. 32, no. 1, pp. 113-127. https://doi.org/10.1515/JOS-2016-0005

}

TY - JOUR

T1 - Constructing synthetic samples

AU - Dong, Hua

AU - Meeden, Glen

PY - 2016/3

Y1 - 2016/3

N2 - We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

AB - We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

KW - Missing data

KW - Sample survey

KW - Synthetic samples

UR - http://www.scopus.com/inward/record.url?scp=84960429486&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84960429486&partnerID=8YFLogxK

U2 - 10.1515/JOS-2016-0005

DO - 10.1515/JOS-2016-0005

M3 - Article

AN - SCOPUS:84960429486

VL - 32

SP - 113

EP - 127

JO - Journal of Official Statistics

JF - Journal of Official Statistics

SN - 0282-423X

IS - 1

ER -