Statistics are weird...
by Chad Jones
March 30, 2013
Intuition, in science, can often get you into a lot of trouble. Things like the wave/particle duality described in quantum mechanics, horseshoe orbits, and relativity seem to bend our minds in ways they just don't want to go. An implicit trust in your intuition will often lead you to conclusions that are very wrong. The math that gives us some important statistics is a great example of this. Statistics can often be very weird. Here are two examples of what I mean:
The Birthday Problem
How many people do you have to put together in a room until it is more likely than not that two of them will share a birthday (assuming a random distribution of birthdays)?
Intuitive (and wrong) answer:
Well, let's think about this. There are 365 days. "More likely than not" means a 50% chance. Since the birthdays are in a random distribution it won't be likely to get the same birthday twice until half of the days are already taken. So after 182 people are in a room together more than half of the days of the year will be taken and it will be more likely than not that the next person will share a birthday with one of the other people. So the answer must be 183, right?
But in reality the answer is:
Would you be surprised to hear that the answer is 23?
Even though 23 different birthdays would leave 93.6% of the days in the year available there is a 50.7% that two people in a room of 23 people will share a birthday. What's even more surprising is that by the time 57 people are in a room together there is a 99% chance that two of them share a birthday. My intuitive answer would be correct if the question were worded "How many people should I invite into a room before it is more likely than not that one of them shares a birthday with me", but since we don't care which two people share a birthday the number lowers drastically. Your birthday doesn't make you special, sorry...
The Monty Hall Problem
A game show host, Monty Hall, shows you three doors. You know that behind one of the doors is a new car. Behind each of the other doors is a goat. You choose a door at random (we'll say door #1). Then, Monty opens door #3 and reveals a goat. Monty then makes you an offer - You can switch and take what's behind door #2 or stick with your original choice of door #1. Should you switch or stick with your initial choice?
The intuitive (and wrong) answer:
When I made my choice I had a one in three chance of getting a car. Monty opened one door, and it wasn't the car. That means that the car is either behind door #1 or door #2. Since it's equally likely to be behind each door it doesn't matter statistically which one I choose.
But in reality the answer is:
The real answer to the Monty Hall problem is that by switching your choice you move from a 1/3 chance of winning a car to a 2/3 chance. It's important to note that Monty knows where the car is and will never open a door to reveal it (that would ruin the game). I'll show you the mathematical reason why this is true. The statistical tool we use is called Bayes' Theorem, which lets us analyze the probability of one event happening given that another event has already occurred. In this case, what is the probability that the car is behind door #1, given that Monty reveals a goat behind door #3. The math behind Bayes' theorem is written as:
Which is read "The probability that event A will happen given that B is true is equal to the probability that B will happen given that event A has happened multiplied by the probability of A divided by the probability of B."
For the Monty Hall problem we have three important variables. The door you choose (Dn), the door Monty opens (Mn) and the door that actually has a car (Cn). So the probability that the car is behind door #2 (C2), given that you chose door #1 (D1) and Monty opened door #3 (M3) is:
Which works out to be:
So, if you choose door #1 and Monty reveals door #3 there is a 66.6% chance that the car is behind door #2 and only a 33.3% chance that it is behind door #1. If you're still not convinced, I have another two explanations on my blog post, found here.
So next time you're faced with a statistics problem that seems simple, just remember: statistics are weird.
by Chad Jones
@Skeptoid Media, a 501(c)(3) nonprofit