## Classroom Resources

A collection of examples – both silly and serious – for illustrating key ideas from statistical thinking.

### Classroom resouces: Selection Bias

#### The Mother of All Sampling Biases

Natural Parenting, by xkcd.

"On one hand, every single one of my ancestors going back billions of years has managed to figure it out. On the other hand, that's the mother of all sampling biases."

#### Success Stories are Selection Biased

How the Cases You Choose Affect the Answers You Get: Selection Bias in Comparative Politics by Barbara Geddes.

"This article demonstrates how the selection of cases for study on the basis of outcomes on the dependent variable biases conclusions. It first lays out the logic of explanation and shows how it is violated when only cases that have achieved the outcome of interest are studied. It then examines three wellknown and highly regarded studies in the field of comparative politics, comparing the conclusions reached in the original work with a test of the arguments on cases selected without regard for their position on the dependent variable. In each instance, conclusions based on the uncorrelated sample differ from the original conclusions."

### Classroom resources: Models

#### Correlation and Causation from "The West Wing"

"PRESIDENT BARTLETT: [Post hoc, ergo propter hoc]. After it, therefore because of it. It means one thing follows the other, therefore it was caused by the other. But it's not always true, in fact it's hardly ever true. We did not lose Texas because of the hat joke."

Link; quote begins at 1m10s.

## Thinking Statistically Key Terms

**Selection Bias** is the process whereby your inferences will be biased if you use a non-random sample and pretend that it’s random.

A **model** is a simplified or abstracted description of a system that captures the essence of what that system does

A **dependent variable** is so called because it can’t vary freely within our model: its value is *dependent* on the values taken by the inputs.

**independent** variables are so called because their variation should not be determined by any of the variables in the equation. Like teenagers, independent variables won’t let anyone tell them what to do or be.

An **error term** sweeps up any random variation in outcomes and represents it as a single term. Error terms only work if the variation they encode is truly random. If we accidentally create a model where the variation is **systematic**, not random, we can run into lots of trouble.

The **planning fallacy** is where people (consistently) under-estimate how long it will take them to complete a given task.

**Correlation**, loosely defined, means that two variables change in relationship with each other: for example, a rise in sneezing is accompanied by a rise in punching. **Causation**, loosely defined, means that one thing *directly caused* the other thing to happen.

**Omitted variable bias** occurs, essentially, when we *omit* a *variable* from our model that has a significant impact on the outcome.

**Exogenous variation** is variation that originates outside the system of interest.

**Endogenous** variation originates *within* the system of interest.

A **conditional probability** is the probability of one event happening *given* that another did. Statisticians write **P(X|Y)** to represent the probability of X happening given that Y did.

A **hypothesis** is a causal explanation for why something happened.

An **alternative hypothesis** is a different possible causal explanations for why something has happenened – different to the main hypothesis we're exploring at the moment.

The **prior probability** for a hypothesis is the probability for it before we see any new evidence.

The **posterior probability** for a hypothesis is the probability for it *after* we've incorporated what we've learned from the new evidence.

The **base rate fallacy** occurs whenever you neglect to take account of the base rate, a.k.a. the prior probability that something was true before new evidence was introduced.