Sir Ronald Fisher once had a conversation with a woman who claimed to be able to tell whether the tea or milk was added first to a cup. Fisher, being interested in probability, decided to test this woman's claim empirically by presenting her with 8 randomly ordered cups of tea - 4 with milk added first, and 4 with tea added first. The women was then supposed to select 4 cups prepared with one method, but is allowed to directly compare each cup (e.g. tasting each cup sequentially, or in pairs).

The lady identified each cup correctly. Do we believe that this could happen by random chance alone?

When we use simulations to examine a hypothesis, we create a distribution that (over many, many simulations) begins to look like our theoretical distribution. This means that simulation-based tests and theory-based tests should come to similar conclusions most of the time. In fact, theory-based tests have some additional assumptions that simulation-based tests do not; as a result, simulation-based tests work even when theory-based tests do not in many cases.

In one-sample tests of categorical variables, we typically want to know whether the proportion of successes (the quantity we're interested in) is equal to a specific value (that is, $\pi = 0.5$ or something of that sort). Our population parameter, $\pi$, represents the unknown population quantity, and our sample statistic, $\hat p$, represents what we know about the value of $\pi$.

In these tests, our null hypothesis is that $\pi = a$, where a is chosen relative to the problem. Often, $a$ is equal to 0.5, because usually that corresponds to random chance.

When simulating these experiments, we will often use a coin flip (for random chance) or a spinner (for other values of $\pi$) to generate data.

One-sample continuous variable experiments cannot be simulated because we do not usually know the characteristics of the population we're trying to predict from. Instead, we use theory-based tests for continuous one-sample data.

In a two-sample test, there are two groups of participants which are assigned different treatments. The goal is to see how the two treatments differ. Because there are two groups, the mathematical formula for calculating the standardized statistic is slightly more complicated (because the variability of $\overline{X}_A - \overline{X}_B$ is a bit more complicated), but in the end that statistic is compared to a similar reference distribution.

The statistic calculated will be $\overline x_1 - \overline x_2$ We will use a null hypothesis of $\mu_1 - \mu_2 = 0$

When we have data that consists of two continuous variables, we generally use linear regression to fit a regression line to the data. This line minimizes the errors in $y$, and is sometimes called the least squares regression line.

The regression line, $\hat{y} = a x + b$, consists of a slope and an intercept. If there is no linear relationship between $x$ and $y$, then we would expect $a = 0$.

We can use hypothesis testing to assess whether the value of $a$ is likely to have occurred by random chance if there is no relationship between $x$ and $y$ using a hypothesis test just like we used in previous sections.

The statistic calculated will be the slope of the line, $a$ We will use a null hypothesis of $a = 0$