1.7 Summary of random variable theory

We can summarize the above informal concepts relating to random variables very compactly if we re-state them in mathematical form. A mathematical statement has the advantage not only of brevity but also of reducing ambiguity.

Formally, a random variable \(Y\) is defined as a function from a sample space of possible outcomes \(S\) to the real number system:

\[\begin{equation} Y : S \rightarrow \mathbb{R} \end{equation}\]

The random variable associates to each outcome \(\omega \in S\) exactly one number \(Y(\omega) = y\). \(S_Y\) is all the \(y\)’s (all the possible values of \(Y\), the support of \(Y\)). I.e., \(y \in S_Y\).

Every random variable \(Y\) has associated with it a probability mass (distribution) function (PMF, PDF). I.e., PMF is used for discrete distributions and PDF for continuous distributions. The PMF maps every element of \(S_Y\) to a value between 0 and 1. The PDF maps a range of values in the support of \(Y\) to a value between 0 and 1 (e.g., \(P(a \leq Y\leq b) \rightarrow [0, 1]\)).

\[\begin{equation} p_Y : S_Y \rightarrow [0, 1] \end{equation}\]

Probability mass functions (discrete case) and probability density functions (continuous case) are functions that assign probabilities or relative frequencies to events in a sample space.

The expression

\[\begin{equation} Y \sim f(\cdot) \end{equation}\]

will be used to mean that the random variable \(Y\) has PDF/PMF \(f(\cdot)\). For example, if we say that \(Y \sim Binomial(n,\theta)\), then we are asserting that the PMF is:

\[\begin{equation} \hbox{Binomial}(k|n,\theta) = \binom{n}{k} \theta^{k} (1-\theta)^{n-k} \end{equation}\]

If we say that \(Y\sim Normal(\mu,\sigma)\), we are asserting that the PDF is

\[\begin{equation} Normal(y|\mu,\sigma)= \frac{1}{\sqrt{2\pi \sigma^2}} \exp \left(-\frac{y-\mu)^2}{2\sigma^2} \right) \end{equation}\]

The cumulative distribution function or CDF is defined as follows:

For discrete distributions, the probability that \(Y\) is less than \(a\) is written:

\[\begin{equation} P(Y<a) = F(Y<a) =\sum_{-\infty}^{a} f(y) \end{equation}\]

For continuous distributions, the summation symbol \(\sum\) above becomes the summation symbol for the continuous case, which is the integral \(\int\). The upper and lower bounds are marked by adding a subscript and a superscript on the integral. For example, if we want the area under the curve between points a and b for some function \(f(y)\), we write \(\int_b^a f(y)\, dy\). So, if we want the probability that \(Y\) is less than \(a\), we would write:

\[\begin{equation} P(Y<a) = F(Y<a) =\int_{-\infty}^{a} f(y)\, dy \end{equation}\]

The above integral is simply summing up the area under the curve between the points \(-\infty\) and \(a\); this gives us the probability of observing \(a\) or a value smaller than \(a\).

We can use the complementary cumulative distribution function to compute quantities like \(P(Y>a)\) by computing \(1-F(a)\), and the quantity \(P(a\leq Y\leq b)\) by computing \(F(b)-F(a)\), where \(b>a\).

A final point here is that we can go back and forth between the PDF and the CDF. If the PDF is \(f(y)\), then the CDF that allows us to compute quantities like \(P(Y<b)\) is just the integral:

\[\begin{equation} F(Y<b)=\int_{-\infty}^b f(y)\, dy \end{equation}\]

The above is simply computing the area under the curve \(f(y)\), ranging from \(b\) to \(-\infty\).

Because differentiation is the opposite of integration (this is called the Fundamental Theorem of Calculus), if we differentiate the CDF, we get the PDF back:

\[\begin{equation} d(F(y))/dy=f(y) \end{equation}\]

In bivariate distributions, the joint CDF is written \(F_{X,Y}(a,b)=P(X\leq a, Y\leq b)\), where \(-\infty < a,b<\infty\). The marginal distributions of \(F_X\) and \(F_Y\) are the CDFs of each of the associated random variables. The CDF of \(X\):

\[\begin{equation} F_X(a) = P(X\leq a) = F_X(a,\infty) \end{equation}\]

The CDF of \(Y\):

\[\begin{equation} F_Y(b) = P(Y\leq b) = F_Y(\infty,b) \end{equation}\]

\(f(x,y)\) is the joint PDF of \(X\) and \(Y\). Every joint PDF satisfies

\[\begin{equation} f(x,y)\geq 0\mbox{ for all }(x,y)\in S_{X,Y}, \end{equation}\] and \[\begin{equation} \int \int_{S_{X,Y}}f(x,y)\,\mathrm{d} x\,\mathrm{d} y=1. \end{equation}\]

where \(S_{X,Y}\) is the joint support of the two random variables.

If X and Y are jointly continuous, they are individually continuous, and their PDFs are:

\[\begin{equation} \begin{split} P(X\in A) = & P(X\in A, Y\in (-\infty,\infty)) \\ = & \int_A \int_{-\infty}^{\infty} f(x,y)\,dy\, dx\\ = & \int_A f_X(x)\, dx \end{split} \end{equation}\]

where

\[\begin{equation} f_X(x) = \int_{-\infty}^{\infty} f(x,y)\, dy \end{equation}\]

Similarly:

\[\begin{equation} f_Y(y) = \int_{-\infty}^{\infty} f(x,y)\, dx \end{equation}\]