Read Allen Downey's Think Stats (second edition) and Think Bayes for getting up to speed with core ideas in statistics and how to approach them programmatically. Both books are completely available online, or you can buy physical copies if you would like.
<img src="img/think_bayes.png" title="Think Bayes" style="float: left"; />
Some people enjoy video content such as Khan Academy's Probability and Statistics or the much longer and more in-depth Harvard Statistics 110. You might also be interested in the book Statistics Done Wrong or a very short overview from School of Data.
Complete the following exercises.
Communicate the problem, how you solved it, and the solution, within each of the following markdown files. (You can include code blocks and images within markdown.)
- Think Stats Chapter 2 Exercise 4 (Cohen's d)
- Think Stats Chapter 3 Exercise 1 (actual vs. biased)
- Think Stats Chapter 4 Exercise 2 (a random distribution)
- Think Stats Chapter 5 Exercise 1 (blue men)
- Think Stats Chapter 6 Exercise 1 (household income)
- Think Stats Chapter 7 Exercise 1 (weight vs. age)
- Think Stats Chapter 8 Exercise 2 (sampling distribution)
- Think Stats Chapter 8 Exercise 3 (scoring)
- Think Stats Chapter 9 Exercise 2 (resampling)
Elvis Presley had a twin brother who died at birth. What is the probability that Elvis was an identical twin?
From Wikipedia: "The twin birth rate in the United States rose 76% from 1980 through 2009, from 18.9 to 33.3 per 1,000 births." "Monozygotic twinning occurs in birthing at a rate of about three in every 1000 deliveries worldwide."
Assuming twin birth rate in the US was the same in 1935 as it was in 1980 (probably not a valid assumption, but may be close enough to use), and given the above information:
P(A|B) = Probability Elvis was an identical twin, given that he was a twin.
P(A) = probability Elivs was an identical twin = 0.3%
P(B) = probability Elvis was any type of twin = 1.89%
P(B|A) = probability Elvis was any type of twin, given that he was an identical twin = 100%
Using Bayes theorem:
p(A|B) = (p(A) p(B|A)) / p(B)
p(A|B) = (.003 * 1) / .0189
p(A|B) = 0.1587 = 15.87%
Code:
def Bayes(prob_a, prob_b, prob_bgivena):
prob_agivenb = (prob_a * prob_bgivena)/prob_b
print prob_agivenb
Bayes(.003, .0189, 1)
How do frequentist and Bayesian statistics compare?
Frequentist stats finds the probability the null hypothesis is true. Bayesian uses prior knowledge to find the probability that the hypothesis is true.