2.11 Exercises

Exercise 2.1 Practice using the qt function

Take an independent random sample of size 142 from a normal distribution with mean 123, and standard deviation 70. Next, we are going to pretend we don’t know the population parameters (the mean and standard deviation). We compute the MLEs of the mean and standard deviation using the data and get the sample mean 145.242 and the sample standard deviation 50.885.

Compute the estimated standard error using the sample standard deviation provided above.
What are your degrees of freedom for the relevant t-distribution?
Calculate the absolute critical t-value for a 95% confidence interval using the relevant degrees of freedom you just wrote above.
Next, compute the lower bound of the 95% confidence interval using the estimated standard error and the critical t-value.
Finally, compute the upper bound of the 95% confidence interval using the estimated standard error and the critical t-value.

Exercise 2.2 Computing the p-value

A paired t-test is done with data from 10 participants. The t-value from the test is 2.1. What is the p-value associated with a two-sided null hypothesis test?

Exercise 2.3 Computing the t-value

If the p-value from a two-sided null hypothesis test had been 0.09, what would be the associated absolute t-value (i.e., ignoring the sign on the t-value)? The number of participants is 10, as above.

Exercise 2.4 Type I and II error

Given that Type I error is 0.01; what is the highest value possible for Type II error?

Exercise 2.5 Practice with the paired t-test

In a self-paced reading study, Grodner and Gibson (2005) investigated subjects vs. object relative clauses. They analyzed the reading times at the relative clause verb. However, a reviewer objects that the whole sentence’s reading times (total reading times) should be used to evaluate the difference between the two conditions, because one cannot know where the difficulty might arise. It isn’t clear whether one should use mean reading times over the entire sentence, or total reading times (summing up all the reading times over the entire sentence). Carry out a by-subjects paired t-test on (a) the critical relative clause verb, versus (b) mean reading time over all words in the two sentence types, and (c) total reading times over all words in the two sentence types. Compare the t-value across the three tests, and decide what the appropriate dependent variable might be (Note: there is no correct answer here).

The data are loaded and pre-processed as follows. The code below gives you the reading times for the data at the relative clause verb. You will have to work out how to obtain mean or total reading times for the whole sentence in each condition.

## load data from lingpsych package:
data("df_gg05e1_full")
## get data from relative clause verb:
df_gge1crit <- subset(
  df_gg05e1_full,
  (condition == "objgap" &
    word_position == 6) |
    (condition == "subjgap" & word_position == 4)
)

Exercise 2.6 Using the paired t-test to test for main effects and interactions

Using the data from Fedorenko, Gibson, and Rohde (2006) discussed in this chapter, carry out all seven paired t-tests to investigate all main effects and interactions.

Exercise 2.7 Explicitly testing for a claimed interaction

## [1] "minF(1, 57 )= 2.75 p-value= 0.1 crit= 4"

## [1] "minF(1, 50 )= 0.06 p-value= 0.81 crit= 4"

S. Vasishth and Lewis (2006) present two self-paced reading experiments in Hindi, their Experiments 2 and 3. In Experiment 2, the distance between a grammatical subject and a verb was manipulated by inserting an intervening inanimate noun between the two words; the expectation was that the intervener would slow processing at the verb. Surprisingly, a faster reading time was seen at the verb when the intervener was present. What was reported in the paper was a linear mixed-effects model-based ANOVA (analysis of variance), which translates to a near-significant speedup effect of the distance manipulation. The approximate statistics extracted from the paper are \(t(57) = -1.655\). The approximate difference in means, guessed at from their Figures 7 and 8, is about \(200\) ms. By contrast, in Experiment 3, new subjects who hadn’t participated in Experiment 2 were shown sentences with the same distance manipulation as in Experiment 2 but with the difference that the intervening noun was animate. The hypothesis was that the animacy status of the intervening noun would either reduce or neutralize the speedup, leading to a smaller speedup effect. Rhe reported statistics show that the distance manipulation was not statistically significant. The approximate t-value is \(t(50) = -0.245\). Figures 12 and 13 in the paper suggest that the approximate difference between the conditions is about \(100\) ms.

Based on these t-values and the estimates of the differences in means, the authors effectively claim that there is a significant difference between the two experiments. Given the above information, is the conclusion justified? Show the results of an appropriate statistical test and use these to argue whether there is support or no support for their conclusion. (Hint: the two-sample t-test will be useful here.)

Exercise 2.8 Using the power.t.test function

[In this exercise, assume that Type I error probability is 0.05 unless otherwise stated, and that we are doing a one-sample t-test, with a two-sided hypothesis test.]

You are given that the effect size is 24, with 95% confidence intervals 19 and 29. The standard deviation can range from 119 to 300.

First, using the power.t.test function, draw for sample size 163 three power curves for the three effect sizes (24, 19 and 29), assuming that the standard deviation ranges from 119 to 300. Draw a single plot showing all three curves, with the standard deviation on the x-axis and power on the y-axis.

Then, redo the power analysis using simulation instead of the t-test. Do the following 10000 times: sample data repeatedly with sample size 163 from a normal distribution with each of the three means above (24, 19 and 29), and assuming that the standard deviation is between 119 and 300. The result of the simulation should be similar to the plot in the chapter: on the x-axis there will be standard deviations ranging from 119 to 300 and the y-axis will show power. Three lines should be drawn to represent the power for the three means considered. (Hint: this exercise just asks you to reuse code from the chapter, with some minimal changes.)

Now answer the following questions:

What will be the statistical power if the effect size is 24, sample size is 163 and standard deviation is 300?
What sample size is needed to obtain a statistical power of 0.80 when the effect size is 24, and standard deviation is 300?
Suppose now that Type I error probability is changed to 0.005. What sample size is needed to obtain a statistical power of 0.80 when the effect size is 24, and standard deviation is 300?

References

Fedorenko, Evelina, Edward Gibson, and Douglas Rohde. 2006. “The Nature of Working Memory Capacity in Sentence Comprehension: Evidence Against Domain-Specific Working Memory Resources.” Journal of Memory and Language 54 (4): 541–53.

Grodner, Daniel, and Edward Gibson. 2005. “Consequences of the Serial Nature of Linguistic Input.” Cognitive Science 29: 261–90.

Vasishth, Shravan, and Richard L. Lewis. 2006. “Argument-Head Distance and Processing Complexity: Explaining Both Locality and Antilocality Effects.” Language 82 (4): 767–94.