Estimates of Sampling Error

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the ZDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the ZDHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the ZDHS is the ISSA Sampling Error Module. This module used the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

In addition to the standard error, ISSA computes the design effect (DEFT) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFt value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error due to the use of a more complex and less statistically efficient design. ISSA also computes the relative error and confidence limits for the estimates.

Sampling errors for the ZDHS are calculated for selected variables considered to be of primary interest. The results are presented in an appendix to the Final Report for the country as a whole, for urban and rural areas, and for the nine provinces. For each variable, the type of statistic (mean, proportion, or rate) and the base population are given in Table B.1 of the Final Report. Tables B.2 to B.13 present the value of the statistic (R), its standard error (SE), the number of unweighted (N) and weighted (WN) cases, the design effect (DEFT), the relative standard error (SE/R), and the 95 percent confidence limits (R+2SE), for each variable. The DEFT is considered undefined when the standard error considering simple random sample is zero (when the estimate is close to 0 or 1). In the case of the total fertility rate, the number of unweighted cases is not relevant, as there is no known unweighted value for woman-years of exposure to child-bearing.

The confidence interval (e.g., as calculated for children ever born to women aged 15-49) can be interpreted as follows: the overall average from the national sample is 3.037 and its standard error is .038. Therefore, to obtain the 95 percent confidence limits, one adds and subtracts twice the standard error to the sample estimate, i.e., 3.037+2x.038. There is a high probability (95 percent) that the true average number of children ever born to all women aged 15 to 49 is between 2.961 and 3.113.

Sampling errors are analyzed for the national sample and for two separate groups of estimates: (1) means and proportions, and (2) complex demographic rates. The relative standard errors (SE/R) for the means and proportions range between 0.2 percent and 20 percent with an average of 3.5 percent; the highest relative standard errors are for estimates of very low values (e.g., currently using injections among women who were currently using a contraceptive method). If estimates of very low values (less than 10 percent) were removed, than the average drops to 2.1 percent. So in general, the relative standard errors for most estimates for the country as a whole is small, except for estimates of very small proportions. The relative standard error for the total fertility rate is small, 2 percent. However, for the mortality rates, the average relative standard error is somewhat higher, 4.6 percent.

There are differentials in the relative standard error for the estimates of sub-populations. For example, for the variable secondary education or higher, the relative standard errors as a percent of the estimated mean for the whole country, for the rural areas, and for Northern Province are 4 percent, 7.8 percent, and 13.5 percent, respectively.

For the total sample, the value of the design effect (DEFT), averaged over all variables, is 1.27 which means that, due to multi-stage clustering of the sample, variance is increased by a factor of 1.6 over that in an equivalent simple random sample.

Finally, the 1996-97 ZDHS sample consisted mostly of the same enumeration areas selected for the 1992 ZDHS; therefore, there was a strong interest in the calculation of sampling errors for the change in rates between the two surveys. Because the two samples were not independent, it is possible to detect change in a particular rate during the period between the two surveys with a smaller sample than if the two samples had been independent. To obtain a measure of the sampling error of the difference in rates between the two surveys, say, for example, the contracepfve prevalence rate, it is necessary to calculate the correlation between the values of the contraceptive prevalence rate for the two surveys at the cluster level and then apply the following formula to calculate the corresponding sampling error:

se(p , -P 2) =~se 2(p l ) + se 2(p2) -2 *p , ~se 2(pl) * se 2(p2)