Constructing synthetic samples

Hua Dong, Glen Meeden

Research output: Contribution to journalArticle

Abstract

We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

Original languageEnglish (US)
Pages (from-to)113-127
Number of pages15
JournalJournal of Official Statistics
Volume32
Issue number1
DOIs
StatePublished - Mar 2016

Fingerprint

Random Search
Percent
Search Algorithm
Simulation Study
Optimization Problem
Optimization
Demonstrate

Keywords

  • Missing data
  • Sample survey
  • Synthetic samples

Cite this

Constructing synthetic samples. / Dong, Hua; Meeden, Glen.

In: Journal of Official Statistics, Vol. 32, No. 1, 03.2016, p. 113-127.

Research output: Contribution to journalArticle

Dong, Hua ; Meeden, Glen. / Constructing synthetic samples. In: Journal of Official Statistics. 2016 ; Vol. 32, No. 1. pp. 113-127.
@article{92e0c37315b34a44bc7058ffffb9dcb5,
title = "Constructing synthetic samples",
abstract = "We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.",
keywords = "Missing data, Sample survey, Synthetic samples",
author = "Hua Dong and Glen Meeden",
year = "2016",
month = "3",
doi = "10.1515/JOS-2016-0005",
language = "English (US)",
volume = "32",
pages = "113--127",
journal = "Journal of Official Statistics",
issn = "0282-423X",
publisher = "Statistics Sweden",
number = "1",

}

TY - JOUR

T1 - Constructing synthetic samples

AU - Dong, Hua

AU - Meeden, Glen

PY - 2016/3

Y1 - 2016/3

N2 - We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

AB - We consider the problem of constructing a synthetic sample from a population of interest which cannot be sampled from but for which the population means of some of its variables are known. In addition, we assume that we have in hand samples from two similar populations. Using the known population means, we will select subsamples from the samples of the other two populations which we will then combine to construct the synthetic sample. The synthetic sample is obtained by solving an optimization problem, where the known population means, are used as constraints. The optimization is achieved through an adaptive random search algorithm. Simulation studies are presented to demonstrate the effectiveness of our approach. We observe that on average, such synthetic samples behave very much like actual samples from the population of interest. As an application we consider constructing a one-percent synthetic sample for the missing 1890 decennial sample of the United States.

KW - Missing data

KW - Sample survey

KW - Synthetic samples

UR - http://www.scopus.com/inward/record.url?scp=84960429486&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84960429486&partnerID=8YFLogxK

U2 - 10.1515/JOS-2016-0005

DO - 10.1515/JOS-2016-0005

M3 - Article

AN - SCOPUS:84960429486

VL - 32

SP - 113

EP - 127

JO - Journal of Official Statistics

JF - Journal of Official Statistics

SN - 0282-423X

IS - 1

ER -