Chapter 1 Some important facts about distributions

In linguistics and psychology, typical data-sets involve either discrete dependent measures such as acceptability ratings on a Likert scale (for example, ranging from 1 to 7), and binary grammaticality judgements, or continuous dependent measures such as reading times or reaction times in milliseconds and EEG signals in microvolts.

Whenever we fit a model using one of these types of dependent measures, we make some assumptions about how these measurements were generated. In particular, we usually assume that our observed measurements are coming from a particular distribution. The normal distribution is an example that may be familiar to the reader.

In this chapter, we will learn how to make explicit the assumptions about the distribution associated with our data; we will also learn to visualize distributions. In order to do this, we need to understand the concept of a random variable, which presupposes some basic knowledge of probability theory (such as the sum and product rules). As will become apparent in this chapter, it is extremely useful to be able to think about data in terms of the underlying random variable producing the data. We consider the two cases, discrete and continuous, separately.

We will explain the terms random variable and distribution below through examples. But it is useful to define the notion of random variable formally.

A random variable, which will be denoted by a variable such as \(Y\), is defined as a function from a sample space of possible outcomes \(S\) to the real number system:

\[\begin{equation} Y : S \rightarrow \mathbb{R} \end{equation}\]

The random variable associates to each outcome \(\omega\) in the sample space \(S\) (\(\omega \in S\)) exactly one number \(Y(\omega) = y\). \(S_Y\) will represent a set that contains all the \(y\)’s (all the possible values of \(Y\), which we call the support of \(Y\)). We can compactly write: \(y \in S_Y\).

Every random variable \(Y\) has associated with it a probability mass (density) function (PMF, PDF). The term PMF is used for discrete distributions, and PDF for continuous distributions. One distinction between the PMF and PDF is crucial: the PMF maps every element of \(S_Y\) to a value between 0 and 1, whereas the PDF maps a range of values \(r\in S_Y\) to a value between 0 and 1 (examples are coming up). For both PMFs and PDFs, we will express this as follows:

\[\begin{equation} p_Y : S_Y \rightarrow [0, 1] \end{equation}\]

Probability mass functions (discrete case) and probability density functions (continuous case) are functions that assign probabilities (discrete case) to discrete events (discrete case) or a continuous range of values (continuous case) in a sample space.

The meanings of the terms PMF, PDF, etc., will become clearer as we discuss examples below. The reader to should revisit the above definition themselves when we present some concrete examples of discrete and continuous random variables below.