The Data Curation Process


DataFirst is involved in the entire Data Curation Lifecycle to support the research process. See how we curate data for reuse and the other services we offer as shown in our Microdata Service Model.  This model also shows how DataFirst supports the virtuous cycle of reuse:  We work with data depositors to improve the quality of their data products, based on feedback from researchers. 

 

1. Accepting Data Deposits
 
Collections Policy - DataFirst accepts deposits of unit record data from census or survey research, or administrative records.
Formats – DataFirst accepts data files in ASCII and all proprietary formats, e.g. Excel, Stata
Documentation – Background documentation helps support data re-use. Any documentation pertaining to the research should be deposited with the data files, including questionnaires, codebooks, and reports.
Data Ownership – Depositors should ensure they are the data owners with the rights to deposit data to be shared by DataFirst

2. Assuring Data
 
2.1 Disclosure Control
Once a dataset has been deposited with us, we undertake disclosure control to ensure the final shared data files do not contain personal data that could be used to identify individuals.  View the DataFirst Disclosure Control Flowchart.
 
2.2 Data Quality Checking
All datasets deposited with us undergo quality checks to confirm the accuracy and usability of the data. Anomalies in data files and documents are corrected in consultation with data depositors. Errors and corrections are recorded as Data Quality Notes in the metadata provided with each dataset.
 
2.3 File versioning
Data files with data quality changes will receive new version numbers. File naming and versioning is according to the Data Documentation Initiative (DDI) standard. DataFirst versions at file level as well as at dataset level and therefore individual data files within a dataset may not have the same version numbers. The version number of the dataset will be that of the latest data file. Notes on this are included in the metadata for new versions. The advantage of this is that researchers will not need to download/recheck data files that have not been changed, just the files that have been changed.

3. Describing the Data (Metadata Creation)
 
Extensive provenance and usage information is created for each dataset in our collection. This metadata is created according to the DDI data description standard using the metadata creation template available free from NESSTAR or the International Household Survey Network.

4. Archiving Data
 
An archival version of all iterations of each dataset is retained by DataFirst. Archival copies are securely preserved and migrated as technology changes, to ensure they are always accessible.

5. Supporting Data Discovery
 
Subject and country searches for data are enabled via . Datasets can be searched at study or variable level.

6. Disseminating Data

DataFirst disseminates datasets under Creative Commons Licenses.

Access and Use Licenses Used by DataFirst

Creative Commons CC-BY Attribution-Only License

  • This license allows re-users to remix, adapt, and build upon the data in any medium or format
  • The license allows for commercial use.
  • This license requires that attribution be given to the data producer as well as DataFirst as the data distributor. The data should be cited according to our recommended citation.
  • The proviso for this license is that the data re-user sends a copy of or link to publications they have produced that are based on the data.

Creative Commons (CC BY-SA) Attribution-ShareAlike License

  • This license allows re-users to remix, adapt, and build upon the data in any medium or format.
  • This license requires the user to distribute any work from the data under the same license as the original.
  • This license requires that attribution be given to the data producer as well as DataFirst as the data distributor. The data should be cited according to our recommended citation.

Creative Commons (CC-BY-NC) Attribution Non-Commercial License

  • This license allows re-users to remix, adapt, and build upon the data in any medium or format.
  • This license allows re-users to use the data for non-commercial purposes only.
  • This license requires that attribution be given to the data producer as well as DataFirst as the data distributor. The data should be cited according to our recommended citation.

 

 

Contact Us

Visit:  Suite 3.48, 3rd floor, School of Economics Building, Middle Campus, University of Cape Town, Rondebosch, Cape Town 
Mail: Private Bag X3, Rondebosch, 7701, Cape Town, South Africa 
Call: +27 (0)21 650 5708
 Email: This email address is being protected from spambots. You need JavaScript enabled to view it.

Have Any Queries

Contact Us: This email address is being protected from spambots. You need JavaScript enabled to view it.
Track query: support.data1st.org

Social Media

fb-gray  twitter-gray  youtube-gray