1.9 Exercises

1.9.1 Practice using the `pnorm` function

1.9.1.1 Part 1

Given a normal distribution with mean 57 and standard deviation 100, use the pnorm function to calculate the probability of obtaining values between 222 and 84 from this distribution.

1.9.1.2 Part 2

Calculate the following probabilities. Given a normal distribution with mean 51 and standard deviation 4, what is the probability of getting

a score of 41 or less
a score of 41 or more
a score of 54 or more

1.9.1.3 Part 3

Given a normal distribution with mean 51 and standard deviation 7, what is the probability of getting

a score of 46 or less.
a score between 48 and 54.
a score of mu+1 or more.

1.9.2 Practice using the `qnorm` function

1.9.2.1 Part 1

Consider a normal distribution with mean 1 and standard deviation 1.

Compute the lower and upper boundaries such that:

the area (the probability) to the left of the lower boundary is 0.37.
the area (the probability) to the left of the upper boundary is 0.94.

1.9.2.2 Part 2

Given a normal distribution with mean 50.643 and standard deviation 0.736. There exist two quantiles, the lower quantile q1 and the upper quantile q2, that are equidistant from the mean 50.643, such that the area under the curve of the Normal probability between q1 and q2 is 85%. Find q1 and q2.

1.9.3 Maximum likelihood estimation 1

The function dnorm gives the likelihood given a data point (or multiple data) and a value for the mean and the standard deviation (sd). Using dnorm, compute

the likelihood of the data point 11.664 assuming a mean of 12 and standard deviation 5.
the likelihood of the data point 11.664 assuming a mean of 11 and standard deviation 5.
the likelihood of the data point 11.664 assuming a mean of 10 and standard deviation 5.
the likelihood of the data point 11.664 assuming a mean of 9 and standard deviation 5.

1.9.4 Maximum likelihood estimation 2

You are given \(10\) independent and identically distributed data points that are assumed to come from a Normal distribution with unknown mean and unknown standard deviation:

##  [1] 503 487 511 488 516 501 488 484 493 505

The function dnorm gives the likelihood given multiple data points and a value for the mean and the standard deviation. The log-likelihood can be computed by typing dnorm(...,log=TRUE).

The product of the likelihoods for two independent data points can be computed like this: Suppose we have two independent and identically distributed data points 5 and 10. Then, assuming that the Normal distribution they come from has mean 10 and standard deviation 2, the joint likelihood of these is:

dnorm(5,mean=10,sd=2)*dnorm(10,mean=10,sd=2)

## [1] 0.001748

It is easier to do this on the log scale, because then one can add instead of multiplying. This is because \(\log(x\times y)= \log(x) + \log(y)\). For example:

log(2*3)

## [1] 1.792

log(2) + log(3)

## [1] 1.792

So the joint log likelihood of the two data points is:

dnorm(5,mean=10,sd=2,log=TRUE)+dnorm(10,mean=10,sd=2,log=TRUE)

## [1] -6.349

Even more compactly:

sum(dnorm(c(5,10),mean=10,sd=2,log=TRUE))

## [1] -6.349

Given the 10 data points above, calculate the maximum likelihood estimate (MLE) of the expectation.
The sum of the log-likelihoods of the data x, using as the mean the MLE from the sample, and standard deviation 5.
What is the sum of the log-likelihood if the mean used to compute the log-likelihood is 495.6?
Which value for the mean, the MLE or 495.6, gives the higher log-likelihood?

1.9.5 Generating bivariate data

Generate 50 data points from two random variables X and Y, where \(X\sim Normal(50,100)\) and \(Y\sim Normal(100,20)\). The correlation between the random variables is 0.7. Plot the simulated data points from Y against those from X.

1.9.6 Generating multivariate data

The bivariate case can be generalized to more than two dimensions. Generate 50 data points from three random variables X, Y, and Z, where \(X\sim Normal(50,100)\), \(Y\sim Normal(100,20)\), and \(Z\sim Normal(200,50)\). The correlation between the random variables X and Y is 0.5, between X and Z is 0.2, an between Y and Z is 0.7. Here, you will have to define a \(3\times 3\) variance covariance matrix, with the pairwise covariances in the off-diagonals. Plot the simulated data points as two-dimensional figures: Y against X, Y against Z, and X against Z.