← Curriculum Optional Deep Dive

Bayes' Theorem

The mathematical tool that converts P(D|H) into P(H|D) — what you actually want to know

Bayes' Theorem: The Formula

Bayes' Theorem

P(H|D) = P(D|H) × P(H)
―――――――――――
P(D)

This formula lets you calculate P(H|D) — the probability of a Hypothesis given the Data — which is what you actually want to know!

The Four Terms

P(H|D) — Posterior

Probability of Hypothesis given Data

What you want: "Given I tested positive, what's the probability I have the disease?"

P(H) — Prior

Probability of Hypothesis before seeing data

Your starting belief: "Before the test, what was my probability of having the disease?"

P(D|H) — Likelihood

Probability of Data given Hypothesis is true

Test sensitivity: "If I have the disease, what's the probability of testing positive?"

P(D) — Evidence

Total probability of seeing this data

Normalizer: "What's the overall probability of testing positive (from any cause)?"

Expanded Form

P(D) — the total probability of the data — can be expanded:

P(D) = P(D|H) × P(H) + P(D|¬H) × P(¬H)

"Probability of data if hypothesis true × prior that hypothesis is true
PLUS
Probability of data if hypothesis false × prior that hypothesis is false"

So the full formula becomes:

P(H|D) = P(D|H) × P(H)
―――――――――――――――――――――
P(D|H) × P(H) + P(D|¬H) × P(¬H)
In plain English:

Posterior = (Likelihood × Prior) / Evidence

Your updated belief = How well the data fits your hypothesis × Your prior belief, normalized by the total probability of seeing that data.

The Intuition: What Bayes' Theorem Does

PRIOR
What you believed before
+
NEW DATA
Evidence you observed
POSTERIOR
What you believe now

Bayes' Theorem is a belief updating machine. It tells you exactly how to rationally update your beliefs when you get new evidence.

The Core Insight

🎯 How Much Should Evidence Move Your Belief?

It depends on two things:

1
How likely is this evidence if your hypothesis is TRUE?
P(D|H) — the likelihood
2
How likely is this evidence if your hypothesis is FALSE?
P(D|¬H) — the false positive rate

The ratio of these determines how much the evidence should update your belief:

Likelihood Ratio = P(D|H) / P(D|¬H)

"How much more likely is this evidence if H is true vs if H is false?"

Three Scenarios

Likelihood Ratio > 1: Evidence is more likely if H is true → Belief in H goes UP

Likelihood Ratio = 1: Evidence equally likely either way → Belief unchanged

Likelihood Ratio < 1: Evidence is more likely if H is false → Belief in H goes DOWN

Why the Prior Matters

🔮 Same Evidence, Different Priors → Different Conclusions

Scenario: A medical test with 95% accuracy comes back positive.

Context Prior P(Disease) Posterior P(Disease|Positive)
Random person, rare disease 0.1% ~2%
Person with symptoms 10% ~68%
Family history + symptoms 50% ~95%

Same test, same result — but vastly different conclusions based on prior probability!

This is why context matters. A positive test means something very different for a symptomatic patient vs a random screening.

The Bayesian Mindset:

1. Start with your best estimate (prior)
2. Observe evidence
3. Ask: How much more/less likely is this evidence if my hypothesis is true?
4. Update your belief proportionally
5. Your new belief (posterior) becomes the prior for the next piece of evidence

This is rational belief updating — exactly what science SHOULD do.

Example: Medical Test (Step by Step)

📋 The Scenario

You take a test for a disease. Here are the facts:

  • Disease prevalence: 1% of people have this disease
  • Test sensitivity: 90% — if you HAVE the disease, test is positive 90% of the time
  • Test specificity: 95% — if you DON'T have the disease, test is negative 95% of the time (5% false positive)

You test POSITIVE. What's the probability you actually have the disease?

Step-by-Step Calculation

1
Identify the terms:
  • P(H) = P(Disease) = 0.01 — Prior (1% have disease)
  • P(¬H) = P(No Disease) = 0.99 — (99% don't have it)
  • P(D|H) = P(Positive|Disease) = 0.90 — Likelihood (90% sensitivity)
  • P(D|¬H) = P(Positive|No Disease) = 0.05 — False positive rate (5%)
2
Calculate P(D) — total probability of testing positive:
P(D) = P(D|H) × P(H) + P(D|¬H) × P(¬H) P(D) = 0.90 × 0.01 + 0.05 × 0.99 P(D) = 0.009 + 0.0495 P(D) = 0.0585

So about 5.85% of all people will test positive (true positives + false positives).

3
Apply Bayes' Theorem:
P(H|D) = P(D|H) × P(H) / P(D) P(H|D) = 0.90 × 0.01 / 0.0585 P(H|D) = 0.009 / 0.0585 P(H|D) = 0.154 = 15.4%
Result: Even with a positive test from a "90% accurate" test, there's only a 15.4% chance you actually have the disease!

Why So Low?

Imagine 10,000 people get tested:

10,000 people ├── 100 have disease (1%) │ ├── 90 test positive (true positives) ✓ │ └── 10 test negative (false negatives) │ └── 9,900 don't have disease (99%) ├── 9,405 test negative (true negatives) └── 495 test positive (false positives!) Total positive tests: 90 + 495 = 585 True positives: 90 P(Disease | Positive) = 90 / 585 = 15.4%

Because the disease is rare (1%), even a small false positive rate (5%) produces more false positives than true positives!

Example: Will It Rain?

🌧️ The Scenario

You wake up and see dark clouds. Should you bring an umbrella?

  • Base rate of rain in your city this time of year: 20%
  • P(Dark Clouds | Rain): 90% — when it rains, there are usually dark clouds
  • P(Dark Clouds | No Rain): 30% — sometimes clouds but no rain

Given dark clouds, what's P(Rain)?

Calculation

Given: • P(Rain) = 0.20 (prior) • P(Clouds|Rain) = 0.90 (likelihood) • P(Clouds|No Rain) = 0.30 (false positive) • P(No Rain) = 0.80 Step 1: Calculate P(Clouds) P(Clouds) = P(Clouds|Rain) × P(Rain) + P(Clouds|No Rain) × P(No Rain) P(Clouds) = 0.90 × 0.20 + 0.30 × 0.80 P(Clouds) = 0.18 + 0.24 = 0.42 Step 2: Apply Bayes' Theorem P(Rain|Clouds) = P(Clouds|Rain) × P(Rain) / P(Clouds) P(Rain|Clouds) = 0.90 × 0.20 / 0.42 P(Rain|Clouds) = 0.18 / 0.42 P(Rain|Clouds) = 0.43 = 43%
Interpretation:

Without seeing clouds, your belief in rain was 20%.
After seeing dark clouds, your belief updated to 43%.

The clouds are evidence for rain, but not conclusive — bring an umbrella, but don't cancel your picnic!

Notice the Likelihood Ratio

Likelihood Ratio = P(Clouds|Rain) / P(Clouds|No Rain) = 0.90 / 0.30 = 3

Dark clouds are 3× more likely when it's going to rain vs when it isn't.

This tells you clouds are decent evidence for rain (ratio > 1), but not overwhelming. If clouds were 10× more likely with rain, your belief would update more dramatically.

Example: Email Spam Filter

This is how Bayesian spam filters actually work!

📧 The Scenario

An email contains the word "FREE". Is it spam?

  • Base rate: 40% of all emails are spam
  • P("FREE" | Spam): 70% — spam emails often contain "FREE"
  • P("FREE" | Not Spam): 10% — legitimate emails sometimes say "FREE"

Calculation

Given: • P(Spam) = 0.40 • P("FREE"|Spam) = 0.70 • P("FREE"|Not Spam) = 0.10 • P(Not Spam) = 0.60 Step 1: P("FREE") P("FREE") = 0.70 × 0.40 + 0.10 × 0.60 P("FREE") = 0.28 + 0.06 = 0.34 Step 2: Bayes' Theorem P(Spam|"FREE") = P("FREE"|Spam) × P(Spam) / P("FREE") P(Spam|"FREE") = 0.70 × 0.40 / 0.34 P(Spam|"FREE") = 0.28 / 0.34 P(Spam|"FREE") = 0.82 = 82%
Result: An email containing "FREE" has an 82% probability of being spam (up from 40% base rate).

What Makes This Powerful

Real spam filters use many words and update sequentially:

Start: P(Spam) = 40% See "FREE": P(Spam) → 82% See "WINNER": P(Spam) → 97% See "ACT NOW": P(Spam) → 99.5% → Mark as spam!

Each word provides evidence. The posterior from one update becomes the prior for the next. This is the power of Bayesian updating!

Sequential Belief Updating

The real power of Bayes' Theorem is that you can keep updating as new evidence comes in. Today's posterior becomes tomorrow's prior.

Prior₁
+
Data₁
Posterior₁
↓ becomes
Prior₂
+
Data₂
Posterior₂
↓ becomes
Prior₃
+
Data₃
Posterior₃

Example: Coin Flipping

Someone hands you a coin. You're 80% sure it's fair, 20% sure it's biased (80% heads).

You flip it 5 times and get: H, H, T, H, H (4 heads, 1 tail)

How does your belief update with each flip?

Start: P(Fair) = 80%, P(Biased) = 20% Flip 1: HEADS • P(H|Fair) = 0.50 • P(H|Biased) = 0.80 • Likelihood ratio = 0.80/0.50 = 1.6 (favors biased) • Update: P(Fair) → 71%, P(Biased) → 29% Flip 2: HEADS • Likelihood ratio still 1.6 • Update: P(Fair) → 61%, P(Biased) → 39% Flip 3: TAILS • P(T|Fair) = 0.50 • P(T|Biased) = 0.20 • Likelihood ratio = 0.20/0.50 = 0.4 (favors fair!) • Update: P(Fair) → 76%, P(Biased) → 24% Flip 4: HEADS • Update: P(Fair) → 67%, P(Biased) → 33% Flip 5: HEADS • Update: P(Fair) → 57%, P(Biased) → 43% Final: After 4H, 1T — you're now 57% confident it's fair (down from 80%)
Key insights:

• Each piece of evidence moves your belief in the direction it supports
• Heads moves you toward "biased", tails moves you toward "fair"
• You never reach 0% or 100% — you just get more/less confident
• With enough evidence, you'd eventually figure out the truth

This Is How Science SHOULD Work

Bayesian Science:

1. Start with prior beliefs based on existing knowledge
2. Run experiment, collect data
3. Update beliefs based on how much the data supports/refutes hypotheses
4. New posterior = starting point for next experiment
5. Over time, converge toward truth

Instead, broken science asks: "Is p < 0.05? Yes? Publish. Done."
No priors, no updating, no convergence — just binary "significant" or "not."

Bayes' Theorem Calculator

Try it yourself! Enter values to calculate P(H|D).

Posterior Probability P(H|D)
15.4%

Try These Scenarios

Scenario P(H) P(D|H) P(D|¬H)
Rare disease, good test 0.01 0.90 0.05
Common disease, good test 0.10 0.90 0.05
Rare disease, perfect test 0.01 1.00 0.00
Drug trial (10% of drugs work) 0.10 0.80 0.05
Spam filter 0.40 0.70 0.10