7.3 Interactions in generalized linear models (with non-linear link functions)

We next look at generalized linear models, where a linear predictor is passed through a non-linear link function to predict the dependent variable. Examples for generalized linear models include logistic regression models and models assuming a log-normal or a Poisson distribution. Here, we treat an example with a logistic model in a 2 x 2 factorial between-subject design. The logistic model has the following non-linear link function: \(p(y=1 \mid x, b) = \frac{1}{1 + \exp(-\eta)}\), where \(\eta\) is the latent linear predictor. For example, in our 2 x 2 factorial design with main effects A and B and their interaction, \(\eta\) is computed as a linear combination of the intercept plus the main effects and their interaction: \(\eta = 1 + \beta_A x_A + \beta_B x_B + \beta_{A \times B} x_{A \times B}\).

Thus, there is a latent level of linear predictions (\(\eta\)), which are then passed through a non-linear link function to predict the probability that the observed data is a success (\(p(y = 1)\)). We will use this logistic model to analyze an example data set where the dependent variable is dichotomous, coded as either a 1 (indicating success) or a 0 (indicating failure).

We load a simulated data set where the dependent variable codes whether a subject performed a task successfully (pDV = 1) or not (pDV = 0). Moreover, the data set has two between-subject factors A and B. The means and frequentist 95% confidence intervals for each of the four conditions are shown in Table 7.3.

## tibble[,4] [200 × 4] (S3: tbl_df/tbl/data.frame)
##  $ A  : Factor w/ 2 levels "A1","A2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ B  : Factor w/ 2 levels "B1","B2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ pDV: int [1:200] 0 0 0 1 0 0 0 0 0 0 ...
##  $ id : int [1:200] 1 2 3 4 5 6 7 8 9 10 ...
TABLE 7.3: Summary statistics per condition for the simulated data.
Factor A Factor B N data Means
A1 B1 50 0.2
A1 B2 50 0.5
A2 B1 50 0.2
A2 B2 50 0.8

To analyze this data, we use scaled sum contrasts, as we had done above for the \(2 \times 2\) design with response times as the dependent variable, and which allow us to interpret the main effects as main effects. Next, we fit a linear model. The model specification is the same as the model with response times - with two differences: First, the family argument is now specified as family = binomial(link = "logit") to indicate the logistic model.

##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    -0.35       0.17   -2.06     0.04
## A1              0.69       0.34    2.06     0.04
## B1              2.08       0.34    6.17     0.00
## A1:B1           1.39       0.67    2.06     0.04

The results from this analysis show that the estimates for the two main effects (“A1” and “B1”) as well as the interaction (“A1:B1”) are positive and the 95% confidence intervals do not include zero. We could now proceed to perform likelihood ratio tests to investigate the evidence that there is for each of the effects.