Survey ID Number
mwi-nso-dhs-2010-v1
Title
Demographic and Health Survey 2010, Malawi
Sampling Procedure
The 2010 MDHS called for a nationally representative sample of about 25,600 interviews of women between the ages of 15 and 49. The survey was designed to provide information on fertility and childhood mortality, family planning, maternal and child health, knowledge and behaviour regarding AIDS and other sexually transmitted infections (STI), domestic violence, and HIV prevalence and other health issues among the adult population.
Administratively, Malawi is divided into 28 districts. The sample was designed to provide estimates in 27 districts for most health and demographic indicators. The district of Likoma is small and therefore was combined with Nkhata Bay. Indicators are also shown for the Northern, Central, and Southern Regions of the country.
- Northern Region: Chitipa, Karonga, Likoma, Mzimba, Nkhata Bay, and Rumphi
- Central Region: Dedza, Dowa, Kasungu, Lilongwe, Mchinji, Nkhotakota, Ntcheu, Ntchisi, and Salima
- Southern Region: Balaka, Blantyre, Chikhwawa, Chiradzulu, Machinga, Mangochi, Mulanje, Mwanza, Neno, Nsanje, Mwanza, Neno, Nsanje, Phalombe, Thyolo, and Zomba
In addition, a men's survey was conducted in a subsample of one in three households selected for the women's survey. All men age 15-54 in the subsample of households were eligible for the men's survey. The men's survey was designed to collect information on family planning, knowledge and behaviour regarding AIDS and other STIs, and adult health issues. All men age 15-54 and all women age 15-49 in the households selected for the men's survey were also eligible for HIV testing.
SAMPLING FRAME
The sampling frame used for the 2010 MDHS was based on summary data for the enumeration areas (EAs) of the 2008 Malawi Population and Housing Census (PHC). The sampling frame consists of 9,145 EAs throughout the nation. Maps delineating the EA boundaries were created. Of the 9,145 EAs, 1,076 are urban and 8,069 are rural. The EA size (i.e., number of regular households in the EA or village) varies from 0 to 954, with an average of 249 households. The sampling frame was stratified into the 27 districts. Within each of the districts, the sampling frame was further stratified by urban and rural areas.
SAMPLE ALLOCATION
Sample allocation plays an important part in sample design because it relates to the survey precision at the national level. In the absence of accurate information on the main survey indicators at the domain level, the best allocation is proportional allocation. The allocation is proportional to the domain's population size. Because the desired sample size at the national level is large (at least 27,200 households), survey precision at the national level was not the only goal for the design of the 2010 MDHS. Rather, given the number of study domains (27 domains), the survey precision at the domain level was an important objective for the 2010 MDHS.
To ensure comparability across the study domains, the sample size for each domain should be similar. Due to the range in population size of the districts, however, proportional allocation could not be used. This would lead to very different levels of precision between the estimates for these districts. The initial plan for the sample design included a flat sample of 1,000 households per district. However, this plan was revised to allow for a larger sample size in the districts of Lilongwe and Blantyre because these two districts contain the major urban centers in the country. The sample size in these districts was increased to 1,300 households, and the target sample size was decreased from 1,000 households to 950 in the eight smallest districts to reach approximately the same target sample size of households at the national level (27,345). Using this approach, the larger domains would be undersampled and the smaller domains would be oversampled to achieve accurate representation of each domain. [Given the small size of the urban population (10 percent), oversampling is applied to urban areas to ensure that the survey precision is comparable across urban and rural areas].
The sample allocation between urban and rural areas is a power allocation, which is an allocation between proportional allocation and equal size allocation. A power value is applied to achieve a satisfactory sample size. Oversampling or undersampling any particular domain does not pose any problems for representativeness if sampling weights are properly calculated and applied in tabulation.
The sample allocation must be converted to a number of primary sampling units (PSUs). It was decided to select 20 households in an urban cluster and 35 households in a rural cluster.The total number of clusters is 849, with 158 urban clusters and 691 rural clusters. The total number of households selected is 27,345, with 3,160 urban households and 24,185 rural households.
SAMPLING PROCEDURE AND UPDATING OF THE SAMPLING FRAME
The 2010 MDHS sample is a stratified sample selected in two stages. Stratification is achieved by separating each study domain into urban and rural areas. Areas are defined as urban or rural based on the classification in the 2008 Malawi PHC. Therefore, the 27 domains are stratified into a total of 54 sampling strata.
Samples are selected independently in every stratum, by a two-stage selection. This means that 54 independent samples were selected, one from each sampling stratum. Implicit stratifications were achieved at each of the lower geographical or administrative levels by sorting the sampling frame according to the geographical/administrative order and by using probability proportional to the size in the first stage of sampling. The explicit and implicit stratifications together guarantee a better scattering of the sampled points.
In the 2010 MDHS design the primary sampling units (PSUs) are the enumeration areas (EAs) from the 2008 Malawi PHC, and the secondary sampling units (SSUs) are the households.
In the first stage of selection for the 2010 MDHS, the 849 EAs were selected with a probability proportional to the size EA. The EA size is the number of households it contains. After this selection and before the data collection, a household listing operation was conducted during May-June 2009 in all of the selected 849 EAs. The listing operation consisted of visits to every selected EA. During the visits, records were made of every structure found on the ground; structures were identified by type (residential or not); number of households in each residential structure were identified; and a location map and a sketch map were drawn to show boundaries of the EA and the location of each structure within it. A household list was set up for each selected EA (or PSU). The resulting lists of households served as the sampling frame for the selection of households in the second stage.
In the second stage of selection, a fixed number of 20 households were selected in urban PSUs and 35 households were selected in rural PSUs by equal probability systematic sampling. To improve the sampling frame and minimize the task of household listing, a few large EAs were subdivided into smaller segments. During fieldwork, a few clusters were found to be dramatically smaller than they were at the time of listing. Despite selecting every household in these clusters, the sample size did not reach the predetermined number. This situation resulted in a net decrease of 38 households between the sample design and fieldwork phases of the survey. Thus, the final sample included 27,307 eligible households.
The decision on the number of households selected per PSU is a trade-off between fieldwork efficiency and precision. All women age 15-49 in the selected households and all men age 15-54 in one-third of the selected households were eligible to be interviewed. The advantages of this two-stage selection procedure are:
- 1: The selection procedure is simple to implement and reduces possible nonsampling errors in the selection process.
- 2: It is easy to locate the selected households, reducing nonsampling errors and nonresponse.
- 3: The interviewers interview only the households in the pre-selected dwellings. No replacement of dwellings was permitted, preventing survey bias.
MEN'S SUBSAMPLE
In the households selected for the women's survey in each PSU, a subsample of one in three households was selected for the men's survey. All men age 15-54 in the selected households were eligible for the men's survey. Conducting a men's survey in a subsample of the total number of households selected was a result of budget restrictions, yet the subsample still allowed for acceptable precision in order to calculate men's indicators. The minimum sample size is larger for women than for men because complex indicators, such as total fertility and infant and child mortality rates, require larger sample sizes to achieve sampling errors of acceptable size, and these data come from interviews with women. The men's subsample was selected randomly from the list of selected households in each PSU. The men's sample is representative for the study domains and for the country as a whole.
Data Collection Notes
PRETEST
The training for the pretest took place from January through February 2010. Twelve interviewers (six females and six males) and five supervisors were trained to administer the questionnaires. Two laboratory scientists from CHSU and a biomarker specialist from ICF Macro trained interviewers to take anthropometric measurements and collect blood for anaemia and HIV testing. The pretest training for the interviewers and supervisors focused on survey objectives, techniques of interviewing, field procedures, and all sections of the household and individual questionnaires. Blood specimen collection procedures were demonstrated and practiced, and two days of field practice were held. The trainers/resource persons included professionals from NSO and ICF Macro.
The pretest fieldwork was conducted in the Northern, Central, and Southern Regions of Malawi by three teams. The teams were divided according to languages spoken by team members. There was one Tumbuka team in the North and two Chichewa teams, one each in the Central and the Southern Regions. The supervisors and editors were drawn from the NSO core technical team. The teams covered 12 enumeration areas, half in urban areas and half in rural areas. At the end of the fieldwork, a debriefing session was held at NSO among all staff involved in the pretest, and the questionnaires were amended based on the pretest findings.
TRAINING OF FIELD STAFF
NSO recruited and trained 318 people for the fieldwork to serve as supervisors, field editors, female and male interviewers, reserve interviewers, and quality control interviewers. Training of field staff for the main survey was conducted during a four-week period in May through June 2010. Specialists in various areas such as HIV/AIDS, malaria, and family planning were invited as guest lecturers. The training course consisted of instruction regarding interviewing techniques and field procedures, a detailed review of items on the questionnaires, instruction and practice in weighing and measuring children, mock interviews between participants in the classroom, and practice interviews with real respondents in areas outside the 2010 MDHS sample points. During this period, field editors, team supervisors, and quality control interviewers were provided with additional training in methods of field editing, data quality control procedures, and fieldwork coordination. Thirty-seven supervisors, 37 editors, 148 female interviewers, and 74 male interviewers were selected to make up 37 data collection teams for the 2010 MDHS. Six people were selected to be quality control interviewers.
FIELDWORK
Thirty-seven interviewing teams carried out data collection for the 2010 MDHS. Each team consisted of one supervisor (team leader), one field editor, four female interviewers, two male interviewers, and one driver. Six senior staff members from NSO, one ICF Macro resident advisor, and one ICF Macro consultant coordinated and supervised fieldwork activities. Data collection took place over a six-month period, from June through November 2010.
HIV AND ANAEMIA TESTING
In a subsample of one-third of all households, blood specimens were collected for anaemia testing from children age 6-59 months and women age 15-49 years who voluntarily consented to the testing. Additionally, in every third household, blood specimens were collected for HIV testing from all women age 15-49 and men age 15-54 who consented to the test. The protocol for the blood specimen collection and the testing for HIV was reviewed and approved by the Malawi Health Sciences Research Committee, the Institutional Review Board of ICF Macro, and the Centres for Disease Control and Prevention (CDC) in Atlanta.
Women and men who were interviewed in the 2010 MDHS were asked to voluntarily provide five drops of blood for HIV testing. The protocol for the blood specimen collection and analysis was based on the anonymous linked protocol developed for MEASURE DHS. This protocol allows for the merging of the HIV test results with the sociodemographic data collected in the individual questionnaires, provided that information that could potentially identify an individual is destroyed before data linking takes place.
Interviewers explained the procedure, the confidentiality of the data, and the fact that the test results would not be made available to the respondent. They also explained the option of dried blood spot (DBS) storage for use in additional testing. If a respondent consented to the HIV testing, five blood spots from the finger prick were collected on a filter paper card to which a bar code label unique to the respondent was affixed. If the respondent did not consent to additional testing using their sample, it was indicated on the questionnaire that the respondent refused additional tests using their specimen, and the words 'no further testing' were written on the filter paper card. Each household, whether individuals consented to HIV testing or not, was given an information brochure on HIV/AIDS and a list of fixed sites providing voluntary counselling and testing (VCT) services in surrounding districts within the region.
Each DBS sample was given a bar code label, with a duplicate label attached to the Individual Questionnaire. A third copy of the same bar code was affixed to the Blood Sample Transmittal Form to track the blood samples from the field to the laboratory. DBS samples were dried overnight and packaged for storage the following morning. Samples were periodically collected in the field, along with the corresponding completed questionnaires for each completed cluster, and transported to the NSO in Zomba to be logged in, checked, and then transported to the Community Health Sciences Unit (CHSU) in Lilongwe.
Upon arrival at CHSU, each DBS sample was logged into the CSPro HIV Test Tracking System (CHTTS) database, given a laboratory number, and stored at -20°C until tested. According to the HIV testing protocol, testing on all samples could only be conducted after all of the questionnaire data entry was completed, verified, and cleaned, and all unique identifiers were removed from the questionnaire file except the barcode number. HIV testing began in February 2011. The testing protocol was to test all samples on the first assay test, an ELISA, Vironostika® HIV Uni-Form II Plus O, Biomerieux. A negative result was considered negative. All samples with positive results were subjected to a second ELISA test by Enzygnost® Anti-HIV 1/2 Plus, Dade Behring. Positive samples on the second test were considered positive. If the first and second tests were discordant, the sample was retested with tests 1 and 2. If on repetition of tests 1 and 2, both results were negative, the sample was rendered negative. If both results were positive, the sample was rendered positive. If there was still a discrepancy in the results after repeating tests 1 and 2, a third confirmatory test, Western Blot 2.2, Abbott Labs, was administered. The final result was rendered positive if the Western Blot (WB) confirmed the result to be positive and rendered negative if the WB confirmed it to be negative. If the Western Blot results were indeterminate, the sample was rendered indeterminate.
Upon finalising HIV testing, the HIV test results for the 2010 MDHS were entered into a spreadsheet with a barcode as the unique identifier to the result. Data from the HIV results and linked demographic and health data are included in this 2010 MDHS Final Report.
Estimates of Sampling Error
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2010 MDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2010 MDHS sample is the result of a multi-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the 2010 MDHS is the ISSA Sampling Error Module. This module used the Taylor linearisation method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.
The Jackknife repeated replication method derives estimates of complex rates from each of several replications of the parent sample, and calculates standard errors for these estimates using simple formulae. Each replication considers all but one cluster in the calculation of the estimates. Pseudo-independent replications are thus created. In the 2010 MDHS, there were 849 non-empty clusters. Hence, 849 replications were created.
In addition to the standard error, ISSA computes the design effect (DEFT) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFT value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error due to the use of a more complex and less statistically efficient design. ISSA also computes the relative error and confidence limits for the estimates.
Sampling errors for the 2010 MDHS are calculated for selected variables considered to be of primary interest for the women's survey and men's surveys, respectively. The results are presented in an appendix to the Final Report for the country as a whole, for urban and rural areas, and for each of the three regions. For each variable, the type of statistic (mean, proportion, or rate) and the base population are given in Table C.1 of the Final Report. Tables C2 to C7 present the value of the statistic (R), its standard error (SE), the number of unweighted (N-UNWE) and weighted (N-WEIG) cases, the design effect (DEFT), the relative standard error (SE/R), and the 95 percent confidence limits (R±2SE), for each variable. The DEFT is considered undefined when the standard error considering the simple random sample is zero (when the estimate is close to 0 or 1). In the case of the total fertility rate, the number of unweighted cases is not relevant, as there is no known unweighted value for woman-years of exposure to child-bearing.
The confidence interval (e.g., as calculated for children ever born to women aged 40-49) can be interpreted as follows: the overall average from the national sample is 5.711 and its standard error is 0.079. Therefore, to obtain the 95 percent confidence limits, one adds and subtracts twice the standard error to the sample estimate, i.e., 5.711 ± 2 × 0.079. There is a high probability (95 percent) that the true average number of children ever born to all women aged 40 to 49 is between 5.553 and 5.869.
Sampling errors are analyzed for the national woman sample and for two separate groups of estimates: (1) means and proportions, and (2) complex demographic rates. The relative standard errors (SE/R) for the means and proportions range between 0.1 percent and 25.1. In general, the highest relative standard errors are for estimates of very low values (e.g., currently using IUD, 0.1%), which are very few. So in general, the relative standard error for most estimates for the country as a whole are small, except for estimates of very small proportions. The relative standard error for the total fertility rate is small, 1.4 percent. However, for the mortality rates, the average relative standard error is higher; for example, the relative standard error for the 0-4 year estimate of infant mortality is 3.8 percent.