The GHS is an annual household survey specifically designed to measure the living circumstances of South African households. The GHS collects data on education, employment, health, housing and household access to services.
Kind of Data
Sample survey data
Unit of Analysis
Households and individuals
v2.2 Edited, anonymised dataset for public distribution
Version 1 of the General Household Survey 2010 was released in 2011. This version was recalled to allow Statistics SA to undertake further work on the data for disability.
Version 2 of the GHS 2010 was downloaded from Stats SA website in 2011. This version contains the old weights which are no longer available for donwload from the websites of StatsSA.
In v2, compared to v1, the values for the derived variable "undisab" (UN Disability) have been redefined.
The data files in GHS 2010 Version 2.1 contain revised weights. This version was released at the same time as the GHS 2012 (22 August 2013). Reweighting was necessary in order to maintain the comparability of population estimates used in the GHS based on figures provided by the 2013 mid-year population estimation model that incorporates the demographic findings of Census 2011. Household files were weighted independently of person files.
Compared to the household file in GHS 2010 v2, there are 29 new rows of data in v2.1
The following variables were imputed in the person file of GHS 2010 v2.1: disab; sevdisab; literacy3b; metro
The variable “q22msal” – monthly salary, was added to the person file in GHS 2010 v2.1. This variable was not found in the previous version.
Version 2.2 includes contain revised weights. This version was released at the same time as GHS 2017 (21 June 2018). It was decided to replace the 2013 series mid-year population estimation in the previous version with a the more recent 2017 series mid-year population estimation as benchmarks for weighting the GHS data files. Household files were weighted independently of person files.
The scope of GHS 2010 includes:
Demographic information: name, sex, age, population group, etc.
Tourism information: non-remunerated trips undertaken in the 12 months prior to the survey.
Household information: type of dwelling, ownership of dwelling and other assets, electricity, water and sanitation, environmental issues, services, transport, expenditure etc.
The survey is representative at national level and at provincial level.
The lowest level of geographic aggregations is province. Geography type and metro information is also present in the data.
The survey covered all de jure household members (usual residents) of households in the nine provinces of South Africa and residents in workers' hostels. The survey does not cover collective living quarters such as students' hostels, old age homes, hospitals, prisons and military barracks.
Producers and sponsors
Statistics South Africa
Government of South Africa
A multi-stage, stratified random sample was drawn using probability-proportional-to-size principles. First level stratification was based on province and second-tier stratification on district council.
Until 2010 Statistics SA used an integrating weighting methodology. "Integrated' weights allocated the same weight to all household members. The household head's weight was carried over the house file. This model allowed the replication of the population size if household sizes were multiplied with the household weight. However, this method provided variable household totals from year to year.
Therefore from 2010 the Person and House files across the whole GHS series are calibrated independently from each other. The person data is calibrated using the mid-year population estimates from the 2017 series, while the house data is weighted using household estimates that are also based on the 2017 mid-year population series. However, this method means that the totals will not be aligned.
For weights that are better aligned users can transfer the weight allocated to the household head to the household file. Statistics SA ensures that all households in the house file are also represented in the person file.
Dates of Data Collection
Data Collection Mode
Statistics South Africa
Government of South Africa
GHS uses questionnaires as data collection instruments
In GHS 2009-2010:
The variable on care provision (Q129acre) in the GHS 2009 and 2010 should be used with caution. The question to collect the data (question 1.29a) asks:
"Does anyone in this household personally provide care for at least two hours per day to someone in the household who - owing to frailty, old age, disability, or ill-health cannot manage without help?"
Response codes (in the questionnaire, metadata, and dataset) are:
1 = No
2 = Yes, 2-19 hours per week
3 = Yes, 20-49 hours per week
4 = Yes, 50 + hours per week
5 = Do not know
There is inconsistency between the question, which asks about hours per day, and the response options, which record hours per week. The outcome that a respondent who gives care for one hour per day (7 hours/week) would presumably not answer this question. Someone giving care for 13 hours a week would also be excluded as though they do that do serious caregiving, which is incorrect.
In GHS 2009-2015:
The variable on land size in the General Household Survey questionnaire for 2009-2015 should be used with caution. The data comes from questions on the households' agricultural activities in Section 8 of the GHS questionnaire: Household Livelihoods: Agricultural Activities. Question 8.8b asks:
“Approximately how big is the land that the household use for production? Estimate total area if more than one piece.” One of the response category is worded as:
1 = Less than 500m2 (approximately one soccer field)
However, a soccer field is 5000 m2, not 500, therefore response category 1 is incorrect. The correct category option should be 5000 sqm. This response option is correct for GHS 2002-2008 and was flagged and corrected by Statistics SA in the GHS 2016.
Statistics South Africa. General Household Survey 2010 [dataset]. Version 2.2. Pretoria. Statistics South Africa [producer]. 2017. Cape Town. DataFirst [distributor], 2020. DOI: https://doi.org/10.25828/qv5p-8s75