1 The Junior School Project

We shall analyze Peter Mortimer’s dataset from his work “A Study of Effective Elementary Schools” (Mortimer, Ecob, and Sammons 1989). This dataset is known as the “Junior School Project” (JSP), which is available with the name jsp in the R package faraway.

The JSP was a longitudinal study of about 2,000 pupils from 50 primary schools chosen at random among the 636 schools under the Inner London Education Authority (ILEA) in 1980. Here we will use the subset available from that dataset, which contains the following variables:

1.1 Skew Correction

The Maths and Raven scores have negative skew (long left tail) distributions due to a ceiling effect: the difference in score among the most proficient students varies very little regarding the measured ability. Also, the score range in English is different from the other tests. To correct these problems, we have transformed the math, english, and raven variables using the R package bestNormalize (Peterson and Cavanaugh 2019), which analyzes several possible monotonic transformations (e.g., Yeo-Johnson, Lambert) and applies the most efficient for the given case. For our data, the selected method was Ordered Quantile normalization for all the scores. We did the correction over each test score by school year.

JSP Correction for Bias in Scores. (Left) Histogram with the original scores in math, and (Right) after normalization.

Figure 1.1: JSP Correction for Bias in Scores. (Left) Histogram with the original scores in math, and (Right) after normalization.

This normalization does not seem so arbitrary when considering this is not an uncommon practice in educational settings. The ceiling effect occurs when a test does not discriminate enough against the most outstanding students. Consequently, from a certain level of competence onwards, they are receiving very similar scores. Ideally, you should add more difficult questions to the test to better discriminate those skilled students. However, in post facto cases like this, the normalization redistributes the scores to correct this.

The transformed variables have also been standardized (centered and scaled) to have zero as mean and one standard deviation as unit. These transformed variables have names ending in _z. The “JSP Correction for Bias in Scores.” 1.1 graph, shows the distribution of the original maths scores and after the normalization and standardization.

str(jsp)

'data.frame':   3236 obs. of  9 variables:
 $ school   : Factor w/ 49 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ class    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ gender   : Factor w/ 2 levels "m","f": 2 2 2 1 1 1 1 1 1 1 ...
 $ social   : Factor w/ 9 levels "I","II","III nonmanual",..: 9 9 9 2 2 2 2 2 9 9 ...
 $ student  : Factor w/ 1192 levels "1","2","3","4",..: 1 1 1 2 2 3 3 3 4 4 ...
 $ year     : Factor w/ 3 levels "3","4","5": 1 2 3 1 2 1 2 3 1 2 ...
 $ raven_z  : num  -0.403 -0.403 -0.403 -1.588 -1.588 ...
 $ math_z   : num  -0.351 -0.256 -1.146 -1.48 -1.626 ...
 $ english_z: num  0.7718 0.4376 -0.0204 -1.9804 -1.9337 ...

1.2 Effect of the School on the Student’s Academic Progress

In the “Examples of Math Scores” 1.2 graph, we see, for some selected schools, a comparison between mathematics scores in the fifth year against the third. The lines correspond to classical linear regressions fitted on students’ in the same school. They intend to show the score pattern in each school.

Examples of Math Scores. Scores on selected schools. The lines correspond to classical linear regressions fitted on students’ in the same school. They intend to show the score pattern in each school.

Figure 1.2: Examples of Math Scores. Scores on selected schools. The lines correspond to classical linear regressions fitted on students’ in the same school. They intend to show the score pattern in each school.

In some schools, the regression line is almost parallel and completely above that of another school (e.g., 36:light blue over 3:orange), which corresponds to the fact that most students in the school above got better scores at 5th grade, than those in the school below. Consequently, the top school was more effective in promoting student learning than the bottom one.

When the trend lines between two schools are not parallel, the one above on the left side corresponds to where the most lagging students had better progress from 3rd to 5th grade, while the one higher on the right side is where the most proficient students made relatively more progress. Consequently, the lines with little slope correspond to schools where the gap between lagging and skilled students is small.

Having the scores for english and math for the 3rd and 5th year, we could analyze the school’s effect on performance by subject, but we would end up with two classifications of schools, one for english and another for math. Even though we might see how to consolidate those results into a single ranking, this seems like an overly complicated route.

Merging Subjects

We have chosen to merge the english and math scores and consider them just scores at the third score_3_z and fifth score_5_z year. Joining the english and math scores this way helps us demonstrate the use of Multilevel Regressions (MLR) without overly complicated analysis. Likewise, following the same criteria, we have kept only the students who stayed in the same school from 3rd to 5th year.

str(jsp_sc, give.attr = F)
tibble [1,774 × 8] (S3: tbl_df/tbl/data.frame)
 $ school   : Factor w/ 49 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ class    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ gender   : Factor w/ 2 levels "m","f": 2 1 1 2 1 2 2 2 2 1 ...
 $ social   : Factor w/ 9 levels "I","II","III nonmanual",..: 9 2 9 4 9 9 4 2 7 5 ...
 $ student  : Factor w/ 1192 levels "1","2","3","4",..: 1 3 4 6 7 8 11 14 15 16 ...
 $ raven_z  : num [1:1774, 1] -0.445 -0.619 -1.868 -1.449 -1.303 ...
 $ score_3_z: num [1:1774, 1] -0.435 1.598 -0.297 -0.964 -0.571 ...
 $ score_5_z: num [1:1774, 1] -1.15896 1.8403 -0.00456 -2.1354 -0.83616 ...

The outcome variable \(y\) in our model is the 5th year score_5_z score, and the potential predictor variables are score_3_z, raven_z, school, gender, and social. As we centered score_5_z, the outcome variable is the student’s 5th-year score as a deviation from its mean. As usual, the intercept shall have the unit of \(y\). In our case, due to the scaling, it is the standard deviation of the 5th-year scores. For short, we will call \(\sigma_y\) to the standard deviation of score_5_z.

The interception corresponds to the case when the numeric predictors are zero, score_3_z = 0 and raven_z = 0. As these variables are centered at their mean, the interception is the expected score in the 5th year for students with mean numeric predictors. In other words, the interception is the expected 5th-year score for a 3rd year average student.

1.2.1 Causal Model

Mortimer’s study’s primary goal is to find if some schools are more effective than others in promoting student learning and development. The improvement of the scores from the third to the fifth year in the school is indicative of the student’s progress. In causal analysis terminology, the school is the exposure, and the student’s 5th-year score is the outcome when adjusted by the 3rd year ones.

JSP Causal Diagram. Factors that influence student performance.

Figure 1.3: JSP Causal Diagram. Factors that influence student performance.

The “JSP Causal Diagram” 1.3, shows our causal model, where we consider the school, the student’s cognitive ability, and his/her other personal traits as drivers of his progress in the school. We use the Raven score as a proxy of his cognitive ability.

In several schools, the number of students’ fathers on each socioeconomic classification is quite unbalanced towards the upper or lower classes. Hence, it seems appropriate to consider the father’s socioeconomic class influences the school’s selection. We also consider the father’s class also influences the environment and the facilities the student has to develop and study, affecting his/her scores and cognitive skills.

Finally, we also consider the gender may influence the pupil’s focus and predisposition to learn in the school years.

1.3 The Reference Model

To get an idea of how much our following models’ are progressing, we shall build the simplest possible one, where the 5th-year scores are constant. \[\begin{align*} \textrm{score_5\z}^{(i)} &= \beta_0 + \sigma_epsilon^{(i)} \\ \epsilon^{(i)} &\sim \mathcal{N}(0, \sigma_{\epsilon}^2) \end{align*}\]

(jsp_null <- brm(score_5_z ~ 1, data = jsp_sc))
loo(jsp_null)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: score_5_z ~ 1 
   Data: jsp_sc (Number of observations: 1774) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
        total post-warmup samples = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.00      0.02    -0.05     0.05 1.00     3520     2341

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.00      0.02     0.97     1.03 1.00     3591     2725

Computed from 4000 by 1774 log-likelihood matrix
         Estimate   SE
elpd_loo  -2518.7 27.2
p_loo         1.9  0.1
looic      5037.4 54.4
------
Monte Carlo SE of elpd_loo is 0.0.
All Pareto k estimates are good (k < 0.5).

In this model the intercept \(\beta_0 = 0.00\) is the mean of score_5_z, which is zero due to its centering. Recall that \(\sigma_{\epsilon}\), represented by sigma in the listing above, in this model, is the standard deviation of \(y\), and because of the scaling sigma it is equal to \(1\).

1.4 With Intercept by School, Father’ Class, and Student

To assess whether it is appropriate to group by school, by the father’s socioeconomic classification social, and by student, we will fit a model only with intercept in each of these groups. At the population level, we will try the predictors we expect to use in our final model.

jsp_m0 <- brm(
  score_5_z ~ gender + score_3_z + raven_z +
    (1 | school) + (1 | social) + (1 | student),
  data = jsp_sc, refresh = 0
)

There were 127 divergent transitions after warmup. 
The largest R-hat is 1.06, indicating chains have not mixed.
Effective Samples Size (ESS) is too low, indicating posterior means and medians 
and tail quantiles may be unreliable.

The messages in the fitting output, like divergences after warmup come from stan, and they are warning us that it has difficulty to sample the posterior distribution. In this case, we need more iterations iter = 3000, and more warmup = 2000 (by default, they are 2000 and 1000 respectively). We also need a higher value for the parameters adapt_delta = 0.999 and max_treedepth = 12 (by default, they are \(0.8\) and \(10\) respectively). You can use the suggested link above to get documentation about these warnings.

jsp_m0 <- brm(
  score_5_z ~ gender + score_3_z + raven_z +
    (1 | school) + (1 | social) + (1 | student),
  data = jsp_sc,
  seed = 123,
  iter = 3000, warmup = 2000,
  control = list(adapt_delta = 0.99, max_treedepth = 12)
)
jsp_m0

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: score_5_z ~ gender + score_3_z + raven_z + (1 | school) + (1 | social) + (1 | student) 
   Data: jsp_sc (Number of observations: 1774) 
Samples: 4 chains, each with iter = 3000; warmup = 2000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~school (Number of levels: 48) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.24      0.03     0.18     0.31 1.00     1575     2560

~social (Number of levels: 9) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ES