1.9 Exercises
Exercise 1.1 Practice using the pnorm function
Given a normal distribution with mean 500 and standard deviation 100, use the pnorm function to calculate the probability of obtaining values between 200 and 800 from this distribution.
Exercise 1.2 Practice using the pnorm function
Calculate the following probabilities. Given a normal distribution with mean 800 and standard deviation 150, what is the probability of getting
- a score of 700 or less
- a score of 900 or more
- a score of 800 or more
Exercise 1.3 Practice using the pnorm function
Given a normal distribution with mean 600 and standard deviation 200, what is the probability of getting
- a score of 550 or less.
- a score between 300 and 800.
- a score of 900 or more.
Exercise 1.4 Practice using the qnorm function
Consider a normal distribution with mean 1 and standard deviation 1. Compute the lower and upper boundaries such that:
- the area (the probability) to the left of the lower boundary is 0.10.
- the area (the probability) to the left of the upper boundary is 0.90.
Exercise 1.5 Practice using the qnorm function
Given a normal distribution with mean 650 and standard deviation 125. There exist two quantiles, the lower quantile q1 and the upper quantile q2, that are equidistant from the mean 650, such that the area under the curve of the Normal between q1 and q2 is 80%. Find q1 and q2.
Exercise 1.6 Maximum likelihood estimation
The function dnorm gives the likelihood given a data point (or multiple data) and a value for the mean and the standard deviation (sd). Using dnorm, compute
- the likelihood of the data point 420 assuming a mean of 500 and standard deviation 100.
- the likelihood of the data point 420 assuming a mean of 420 and standard deviation 100.
- the likelihood of the data point 420 assuming a mean of 400 and standard deviation 100.
Exercise 1.7 Maximum likelihood estimation
You are given \(10\) independent and identically distributed data points that are assumed to come from a Normal distribution with unknown mean and unknown standard deviation:
x## [1] 497 496 513 500 502 491 493 510 513 512
The function dnorm gives the likelihood given multiple data points and a value for the mean and the standard deviation. The log-likelihood can be computed by typing dnorm(...,log=TRUE).
The product of the likelihoods for two independent data points can be computed like this: Suppose we have two independent and identically distributed data points 5 and 10. Then, assuming that the Normal distribution they come from has mean 10 and standard deviation 5, the joint likelihood of these is:
dnorm(5, mean = 10, sd = 2) * dnorm(10, mean = 10, sd = 5)## [1] 0.0006993
It is easier to do this on the log scale, because then one can add instead of multiplying. This is because \(\log(x\times y)= \log(x) + \log(y)\). For example:
log(2 * 3)## [1] 1.792
log(2) + log(3)## [1] 1.792
So the joint log likelihood of the two data points is:
dnorm(5, mean = 10, sd = 5, log = TRUE) +
dnorm(10, mean = 10, sd = 5, log = TRUE)## [1] -5.557
Even more compactly:
sum(dnorm(c(5, 10), mean = 10, sd = 5, log = TRUE))## [1] -5.557
- Given the 10 data points above, calculate the maximum likelihood estimate (MLE) of the expectation.
- The sum of the log-likelihoods of the data x, using as the mean the MLE from the sample, and standard deviation 5.
- What is the sum of the log-likelihood if the mean used to compute the log-likelihood is
500.7(assume a standard deviation of 5)? - Which value for the mean, the MLE or
500.7, gives the higher log-likelihood?
Exercise 1.8 Generating bivariate data
Generate 50 data points from two random variables X and Y, where \(X\sim Normal(50,100)\) and \(Y\sim Normal(100,20)\). The correlation between the random variables is 0.7. Plot the simulated data points from Y against those from X.
Exercise 1.9 Generating multivariate data
The bivariate case can be generalized to more than two dimensions. Generate 50 data points from three random variables X, Y, and Z, where \(X\sim Normal(50,100)\), \(Y\sim Normal(100,20)\), and \(Z\sim Normal(200,50)\). The correlation between the random variables X and Y is 0.5, between X and Z is 0.2, an between Y and Z is 0.7. Here, you will have to define a \(3\times 3\) variance covariance matrix, with the pairwise covariances in the off-diagonals. Plot the simulated data points as two-dimensional figures: Y against X, Y against Z, and X against Z.