Chapter 6 Contrast coding

Whenever one uses a categorical factor as a predictor in a linear (mixed) model—for example when testing the difference in a dependent variable between two or three experimental conditions—then it is necessary to code the discrete factor levels into numeric predictor variables. This coding is termed contrast coding. For example, in the linear modeling chapter, we coded two experimental conditions as \(-1\) and \(+1\), i.e., implementing a sum contrast. Those contrasts are the numbers that we give to numeric predictor variables to encode specific hypotheses about differences between factor levels and to create predictor terms to test these hypotheses in linear models, including Bayesian linear (mixed) models.

This chapter will introduce contrast coding. The descriptions are in large parts taken from Schad et al. (2020 b) (which is published under a CC-BY license) and adapted for the current chapter.

Consider a situation where we want to test differences in a dependent variable between three factor levels. An example could be differences in response times between three levels of word class (noun, verb, adjective). We might be interested in whether word class influences response times. In frequentist statistics, one way to approach this question would be to run an ANOVA and compute an omnibus F-test for whether word class explains response times. However, if based on such omnibus approaches we find support for an influence of word class on response times, it remains unclear where this effect actually comes from, i.e., whether it originated from the nouns, verbs, or adjectives. However, scientists typically have a priori expectations about which groups differ from each other. In this chapter, we will show how to test specific hypotheses directly, which gives a lot of control over the analyses. Specifically, we show how planned comparisons between specific conditions (groups) or clusters of conditions, are implemented as contrasts. This is a very effective way to align expectations with the statistical model.

References

Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020a. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110.

2020b. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110. Elsevier: 104038.