How to cite datasets used in your research publications


It is good research practice to cite primary data sources in your published papers, just as you would cite research publications

Here are some tips on citing data you use in your research:

  • Identify the data early in your paper, preferably in the abstract.
  • Include a dedicated "data" section so that readers can immediately identify the data that underlies your work.
  • Refer to your data sources in your data tables.
  • Cite data in your references. References are more frequently indexed than full papers, so the citation will be made more visible by its inclusion here.
  • Cite the exact version of the data used in your research, to support data discovery. That is, the version in the citation must be the version you used in your research.
  • Include a unique identifier for the dataset when you cite data, such as a Direct Object Identifier (DOI).

DataFirst is a member of DataCite, the international organisation that supports data citation for better research. We use the DataCite data citation standard, and we mint DOIs for our datasets via DataCite’s Fabrica DOI allocation platform.

To cite our data, check the Recommended Citation field in the metadata provided with each dataset. The citation format is as follows:
Name of producer. Survey name and date [dataset]. Version number. Place of production: Producer [producer], date of production. Place of distribution: Distributor [distributor]. DOI:

For example:
Statistics South Africa. General Household Survey 2018 [dataset]. Version 1. Pretoria: Statistics SA [producer], 2019. Cape Town: DataFirst [distributor], 2019. DOI: https://doi.org/10.25828/9tmn-fz97.

Contact our helpdesk support[at]data1st.org for help with citing our data in your published research.