1.9 Exercises

Exercise 1.1 Practice using the pnorm function

Given a normal distribution with mean 500 and standard deviation 100, use the pnorm function to calculate the probability of obtaining values between 200 and 800 from this distribution.

Exercise 1.2 Practice using the pnorm function

Calculate the following probabilities. Given a normal distribution with mean 800 and standard deviation 150, what is the probability of getting

a score of 700 or less
a score of 900 or more
a score of 800 or more

Exercise 1.3 Practice using the pnorm function

Given a normal distribution with mean 600 and standard deviation 200, what is the probability of getting

a score of 550 or less.
a score between 300 and 800.
a score of 900 or more.

Exercise 1.4 Practice using the qnorm function

Consider a normal distribution with mean 1 and standard deviation 1. Compute the lower and upper boundaries such that:

the area (the probability) to the left of the lower boundary is 0.10.
the area (the probability) to the left of the upper boundary is 0.90.

Exercise 1.5 Practice using the qnorm function

Given a normal distribution with mean 650 and standard deviation 125. There exist two quantiles, the lower quantile q1 and the upper quantile q2, that are equidistant from the mean 650, such that the area under the curve of the Normal between q1 and q2 is 80%. Find q1 and q2.

Exercise 1.6 Maximum likelihood estimation

The function dnorm gives the likelihood given a data point (or multiple data) and a value for the mean and the standard deviation (sd). Using dnorm, compute

the likelihood of the data point 420 assuming a mean of 500 and standard deviation 100.
the likelihood of the data point 420 assuming a mean of 420 and standard deviation 100.
the likelihood of the data point 420 assuming a mean of 400 and standard deviation 100.

Exercise 1.7 Maximum likelihood estimation

You are given \(10\) independent and identically distributed data points that are assumed to come from a Normal distribution with unknown mean and unknown standard deviation:

##  [1] 497 496 513 500 502 491 493 510 513 512

The function dnorm gives the likelihood given multiple data points and a value for the mean and the standard deviation. The log-likelihood can be computed by typing dnorm(...,log=TRUE).

The product of the likelihoods for two independent data points can be computed like this: Suppose we have two independent and identically distributed data points 5 and 10. Then, assuming that the Normal distribution they come from has mean 10 and standard deviation 5, the joint likelihood of these is:

dnorm(5, mean = 10, sd = 2) * dnorm(10, mean = 10, sd = 5)

## [1] 0.0006993

It is easier to do this on the log scale, because then one can add instead of multiplying. This is because \(\log(x\times y)= \log(x) + \log(y)\). For example:

log(2 * 3)

## [1] 1.792

log(2) + log(3)

## [1] 1.792

So the joint log likelihood of the two data points is:

dnorm(5, mean = 10, sd = 5, log = TRUE) +
  dnorm(10, mean = 10, sd = 5, log = TRUE)

## [1] -5.557

Even more compactly:

sum(dnorm(c(5, 10), mean = 10, sd = 5, log = TRUE))

## [1] -5.557

Given the 10 data points above, calculate the maximum likelihood estimate (MLE) of the expectation.
The sum of the log-likelihoods of the data x, using as the mean the MLE from the sample, and standard deviation 5.
What is the sum of the log-likelihood if the mean used to compute the log-likelihood is 500.7 (assume a standard deviation of 5)?
Which value for the mean, the MLE or 500.7, gives the higher log-likelihood?

Exercise 1.8 Generating bivariate data

Generate 50 data points from two random variables X and Y, where \(X\sim Normal(50,100)\) and \(Y\sim Normal(100,20)\). The correlation between the random variables is 0.7. Plot the simulated data points from Y against those from X.

Exercise 1.9 Generating multivariate data

The bivariate case can be generalized to more than two dimensions. Generate 50 data points from three random variables X, Y, and Z, where \(X\sim Normal(50,100)\), \(Y\sim Normal(100,20)\), and \(Z\sim Normal(200,50)\). The correlation between the random variables X and Y is 0.5, between X and Z is 0.2, an between Y and Z is 0.7. Here, you will have to define a \(3\times 3\) variance covariance matrix, with the pairwise covariances in the off-diagonals. Plot the simulated data points as two-dimensional figures: Y against X, Y against Z, and X against Z.