Virtually all quantitative microdata used by social scientists derive from samples that incorporate clustering, stratification, and weighting adjustments (Kish 1992, 1965). Such data can yield standard error estimates that differ dramatically from a simple random sample of the same size. Researchers using historical U.S. census microdata, however, usually apply methods designed for simple random samples. The resulting p-values and confidence intervals could be inaccurate and could lead to erroneous research conclusions. Because U.S. census microdata samples are among the most widely-used sources for social science and policy research, the need for reliable standard error estimation is critical. We evaluate the historical microdata samples of the IPUMS project from 1850-1930 in order to determine (1) the impact of sample design on standard error estimates and (2) how to apply modern standard error estimation software to historical census samples. We exploit a unique new data source from the 1880 census to validate our methods for standard error estimation and then we apply this approach to the 1850-1870 and 1900-1930 decennial censuses. We conclude that Taylor series estimation can be used effectively with the historical decennial census microdata samples, and should be applied in research analyses that have the potential for substantial clustering effects.
|Original language||English (US)|
|State||Published - 2007|
|Name||Minnesota Population Center Working Paper Series|