Weighting
*** Weight Variables Included in the Student Data Files ***
Each student’s sampling weight is a composite of five factors: the school weighting factor, the school weighting adjustment, the class weighting factor, the student weighting factor, and the student weighting adjustment. In addition, three versions of each student’s weight are provided – the “total student” weight, the “senate” weight, and the “house” weight – each with its own particular uses.
The variables described in this section are included in the Student Background and Student Achievement files. The meaning and interpretation of the weights in each of the files is the same. The weighting factors included in the student-level data files and their adjustment factors are as follows:
WGTFAC1 School Weighting Factor
This variable corresponds to the inverse of the probability of selection for the school where the student is enrolled.
WGTADJ1 School Weighting Adjustment
This is an adjustment that is applied to WGTFAC1 to account for nonparticipating schools in the sample. Multiplying WGTFAC1 by WGTADJ1 gives the sampling weight for the school, adjusted for non-participation.
WGTFAC2 Class Weighting Factor
This is the inverse of the probability of selection of the classroom within the school. Since, in general, only one classroom was selected per grade within each school, there was no need to compute an adjustment factor for the classroom weight.
WGTFAC3 Student Weighting Factor
This is the inverse of the probability of selection for the individual student within a classroom. In cases where an intact classroom was selected, the value is set to 1 for all members of the classroom.
WGTADJ3 Student Weighting Adjustment
This is an adjustment applied to the variable WGTFAC3 to account for nonparticipating students in the selected school and/or classroom. Multiplying the variables WGTFAC2, WGTFAC3, and WGTADJ3 and adding them up within each school gives an estimate of the number of students within the sampled school.
The five variables listed above are combined to give a student’s overall sampling weight. The probability of selecting an individual student is the product of three independent events: selecting the school, the classroom, and the student. To obtain the probability of selection for an individual student, multiply three selection probabilities – school, classroom, and student – and their respective adjustment factors. Inverting this probability gives the sampling weight for the student.
Three versions of the students’ sampling weight are provided in the user database. All three give the same figures for statistics such as means and proportions, but vary for statistics such as totals and population sizes. Each one has particular advantages in certain circumstances. These three versions are as follows:
TOTWGT Total Student Weight
This is obtained by simply multiplying the variables WGTFAC1, WGTADJ1, WGTFAC2, WGTFAC3, and WGTADJ3 for the student. The sum of these weights within a sample provides an estimate of the size of the population. Although this is a commonly used sampling weight, it sometimes adds to a very large number, and to a different number within each country. This is not always desirable. For example, if you want to compute a weighted estimate of the mean achievement in the population across all countries, using the variable TOTWGT as your weight variable will lead each country to contribute proportionally to its population size, with the large countries counting more than small countries. Although this is desirable in some circumstances (e.g., when computing the 75th percentile for mathematics achievement for students around the world), in general TOTWGT is not the student weight of choice for cross-country analyses, since it does not treat countries equally, and gives inflated results in significance tests when the proper adjustments are not used.
SENWGT Senate Weight
The variable SENWGT, within each country, is proportional to TOTWGT multiplied by the ratio of 500 divided by the sum of the weights over all students in the grade. These sampling weights can be used when international estimates are sought and you want to have each country contribute the same amount to the international estimate. When this variable is used as the sampling weight for international estimates, the contribution of each country is the same, regardless of the size of the population. See PIRLS 2001 User Guide for more information.
HOUWGT House Weight
The variable HOUWGT is proportional to TOTWGT multiplied by the ratio of the sample size (n) divided by sum of the weights over all students in the grade. These sampling weights can be used when you want the actual sample size to be used in performing significance tests. Although some statistical computer software packages allow you to use the sample size as the divisor in the computation of standard errors, others will use the sum of the weights, and this results in severely deflated standard errors for the statistics if TOTWGT is used as the weighting variable. When performing analyses using such software, it is recommended to use the variable HOUWGT as the weight variable. Because of the clustering effect in most PIRLS samples, it may also be desirable to apply a correction factor such as a design effect to the HOUWGT variable.
*** Weight Variables Included in the Student-Teacher Linkage Files ***
The individual student sampling weights generally should be used when you want to obtain estimates at the student level. The exception is when student and teacher data are to be analyzed together. In this case, a separate set of weights have been computed to account for the fact that a student could have more than one teacher. This set of weights is included in the Student-Teacher Linkage file and is listed below.
TCHWGT
This weight is computed by dividing the sampling weight for the student by the number of teachers that the student has. This weight should be used whenever you want to obtain estimates regarding students and their teachers. The Student-Teacher Linkage file also includes variables that indicate the number of teachers
the student has.
*** Weight Variables Included in the School Data Files ***
The PIRLS samples are samples of students within countries. Although they are made up of a sample of schools within the countries, the samples of schools are selected so that the sampling of students, rather than the sampling of schools, is optimized. In particular, the probability-proportional-to-size sampling methodology causes large schools to be oversampled. Several weight variables are included in the school files, as follows:
WGTFAC1 School Weighting Factor
This variable corresponds to the inverse of the probability of selection for the school where the student is enrolled.
WGTADJ1 School Weighting Adjustment
This is an adjustment that is applied to WGTFAC1 to account for nonparticipating chools in the sample. If you were to multiply WGTFAC1 by GTADJ1 you would obtain the sampling weight for the school, adjusted for non-participation.
SCHWGT School-level Weight
The school sampling weight is the inverse of the probability of selection for the school, multiplied by its corresponding adjustment factor. It is computed as the roduct of WGTADJ1 and WGTFAC1. Although this weight variable can be used to estimate the number of schools with certain characteristics, it is important to keep in mind that the sample selected for PIRLS is a good sample of students, but not necessarily an optimal sample of schools. Schools are selected with probability proportional to their size, so it is expected that there is a greater number of large schools in the sample. For countries that sampled by track within school, the SCHWGT is based on the track size rather than the total school size. This may lead to invalid school-weighted analyses.
*** Other Sampling Variables Included in the Student and Student-Teacher Link Files ***
With complex sampling designs that involve more than simple random sampling, as in the case of PIRLS where a multi-stage cluster design was used, there are several methods for estimating the sampling error of a statistic that avoid the assumption of simple random sampling. One such method is the jackknife repeated replication (JRR) technique (Wolter, 1985). The particular application of the JRR technique used in PIRLS is termed a paired selection model because it assumes that the sampled population can be partitioned into strata, with the sampling in each stratum consisting of two primary sampling units (PSU), selected independently. The following variables capture the information necessary to estimate correct standard errors using the JRR technique:
JKZONE
The variable JKZONE indicates the sampling zone or stratum to which the student’s school is assigned. The sampling zones can have values from 1 to 75 in the Student Background and Student Achievement data files. This variable is included in the Student Background and the Student Achievement data files.
JKREP
The variable JKREP indicates the PSU and its value are used to determine how the student is to be used in the computation of the replicate weights. This variable can have values of either 1 or 0. Those student records with a value of 0 should be excluded from the corresponding replicate weight, and those with a value of 1 should have their weights doubled. This variable is included in the Student Background and the Student Achievement data files. For each individual student, this variable is identical in these two files. Additionally, the variables JKCZONE and JKCREP are included in the school file.
JKCREP
The variable JKCREP can have values of either 1 or 0. It indicates whether this school is to be dropped or have its weight doubled when estimating standard errors. Those school records with a value of 0 should be excluded from the corresponding replicate weight, and those with a value of 1 should have their weights doubled.