Skip to content

S2 Chapter 1: The Binomial Distribution

S2 Statistics: Chapter 1 — The Binomial Distribution

Section titled “S2 Statistics: Chapter 1 — The Binomial Distribution”

Welcome, fellow mathematical detectives! Today, we embark on a journey back to 1654 France, where we’ll step into the shoes of mathematicians to solve a puzzle that baffled the brightest minds of the era. This challenge not only gave birth to an entirely new branch of mathematics but directly leads us to our chapter’s central topic: The Binomial Distribution.

The story unfolds with two equally skilled knights engaged in a contest that was abruptly interrupted, creating a problem that would revolutionize mathematical thinking forever.

Act I: The Interrupted Game — A Historical Dilemma

Section titled “Act I: The Interrupted Game — A Historical Dilemma”

Picture this: Two equally skilled knights, Antoine and Blaise, are engaged in a dice-throwing competition in the royal court of France. The rules are elegantly simple:

  • The first knight to win 3 rounds claims the entire prize of 64 gold coins
  • Each round has an equal probability of being won by either knight
  • The rounds are independent of each other

Current situation: Antoine leads with a score of 2:1.

Suddenly, a royal summons arrives! The King requires their immediate presence, and the game must be terminated at once. This creates our central dilemma:

Before we dive into the mathematical solution, let’s consider some intuitive approaches:

Act II: The Genius Solution — Letters Between Mathematical Giants

Section titled “Act II: The Genius Solution — Letters Between Mathematical Giants”

The knight Blaise (who happened to be the mathematician Blaise Pascal) wrote to his friend Pierre de Fermat seeking a solution. Their correspondence revealed a revolutionary insight:

To apply this insight, we need to determine what each knight needs to win:

  • Antoine needs to win 1 more round to reach 3 total wins
  • Blaise needs to win 2 more rounds to reach 3 total wins

Since each knight has equal skill (p=0.5p = 0.5 for each round), and rounds are independent, we can reframe our question:

The game will end within 2 rounds maximum. Let’s enumerate all possible sequences:

Tree Diagram:

Game Tree

Sequence Analysis:

  • A: Antoine wins in round 1 → Game over, Antoine wins (P=0.5P = 0.5)
  • BA: Blaise wins round 1, Antoine wins round 2 → Antoine wins (P=0.5×0.5=0.25P = 0.5 \times 0.5 = 0.25)
  • BB: Blaise wins both rounds → Blaise wins (P=0.5×0.5=0.25P = 0.5 \times 0.5 = 0.25)

P(Antoine wins ultimately)=P(A)+P(BA)=0.5+(0.5×0.5)=0.75P(\text{Antoine wins ultimately}) = P(A) + P(BA) = 0.5 + (0.5 \times 0.5) = 0.75

P(Blaise wins ultimately)=P(BB)=0.5×0.5=0.25P(\text{Blaise wins ultimately}) = P(BB) = 0.5 \times 0.5 = 0.25

Fair Distribution: The 64 coins should be divided in the ratio 0.75:0.25=3:10.75 : 0.25 = 3:1

  • Antoine receives: 64×0.75=4864 \times 0.75 = 48 coins
  • Blaise receives: 64×0.25=1664 \times 0.25 = 16 coins

Deep Dive: Uncovering the Binomial Pattern

Section titled “Deep Dive: Uncovering the Binomial Pattern”

The Binomial Distribution: Formal Framework

Section titled “The Binomial Distribution: Formal Framework”

Jacob Bernoulli generalized this “fixed number of independent trials with constant success probability” model, creating what we now call the Binomial Distribution. Carl Friedrich Gauss later discovered that the probability sequence corresponds exactly to the terms in the binomial expansion (p+q)n(p + q)^n where q=1pq = 1-p, hence the name.

Definition: Binomial Distribution

A random variable XX follows a binomial distribution, denoted XB(n,p)X \sim B(n,p), if it satisfies the BINS conditions:

  1. Binary outcomes: Each trial has exactly two possible outcomes (success/failure)
  2. Independence: Trials are mutually independent
  3. Number fixed: The number of trials nn is predetermined
  4. Same probability: The probability of success pp remains constant across trials

Where:

  • nn = number of trials
  • pp = probability of success on each trial
  • XX = number of successes in nn trials

Theorem: Binomial Probability Mass Function

For XB(n,p)X \sim B(n,p), the probability of exactly rr successes is:

P(X=r)=(nr)pr(1p)nrP(X = r) = \binom{n}{r}p^r(1-p)^{n-r}

where r=0,1,2,,nr = 0, 1, 2, \ldots, n and (nr)=n!r!(nr)!\binom{n}{r} = \frac{n!}{r!(n-r)!}.

Theorem: Expectation and Variance

For XB(n,p)X \sim B(n,p):

  • Expected value: E(X)=npE(X) = np
  • Variance: Var(X)=np(1p)\text{Var}(X) = np(1-p)

Pattern Recognition: In our opening problem, Antoine winning is equivalent to him winning at least 1 round out of the next 2 possible rounds.

If we let XX = number of rounds Antoine wins in the next 2 rounds, then XB(2,0.5)X \sim B(2, 0.5).

Using the binomial probability formula:

P(X1)=P(X=1)+P(X=2)P(X=1)=(21)(0.5)1(0.5)1=2×0.25=0.5P(X=2)=(22)(0.5)2(0.5)0=1×0.25=0.25P(X1)=0.5+0.25=0.75\begin{aligned} P(X \geq 1) &= P(X=1) + P(X=2) \\ P(X=1) &= \binom{2}{1}(0.5)^1(0.5)^1 = 2 \times 0.25 = 0.5 \\ P(X=2) &= \binom{2}{2}(0.5)^2(0.5)^0 = 1 \times 0.25 = 0.25 \\ P(X \geq 1) &= 0.5 + 0.25 = 0.75 \quad \checkmark \end{aligned}

This matches our exhaustive calculation and naturally leads us to the binomial distribution!

Binomial Cumulative Distribution Table (Extract)

Section titled “Binomial Cumulative Distribution Table (Extract)”

The tabulated value is P(Xx)P(X \leq x), where XX has a binomial distribution with index nn and parameter pp.

p =0.050.100.150.200.250.300.350.400.450.50
n=8, x=00.66340.43050.27250.16780.10010.05760.03190.01680.00840.0039
x=10.94280.81310.65720.50330.36710.25530.16910.10640.06320.0352
x=20.99420.96190.89480.79690.67850.55180.42780.31540.22010.1445
x=30.99960.99500.97860.94370.88620.80590.70640.59410.47700.3633
x=41.00000.99960.99710.98960.97270.94200.89390.82630.73960.6367
x=51.00001.00000.99980.99880.99580.98870.97470.95020.91150.8555
x=61.00001.00001.00000.99990.99960.99870.99640.99150.98190.9648
x=71.00001.00001.00001.00001.00000.99990.99980.99930.99830.9961

Example: CATL Battery Production

Background: CATL produces lithium-ion batteries for electric vehicles. Based on historical data, their production process has a 95% success rate, meaning each battery independently has a 95% probability of meeting quality standards.

Scenario: A batch of 50 batteries has just been produced.

Part A: Basic Probability Questions

  1. What’s the probability of exactly 48 working batteries?
  2. What’s the expected number of defective batteries in this batch?
  3. What’s the standard deviation of the number of defective batteries?

Part B: Quality Control Decisions

  1. The company’s policy is to reject a batch if it contains 4 or more defective components. What’s the probability that this batch will be rejected?
  2. If the batch is accepted, what’s the probability that it contains at most 1 defective component?

Part C: Cost Analysis

  1. Each defective component costs $20 to replace under warranty. What’s the expected warranty cost for this batch?
  2. If the company wants to be 90% confident that warranty costs won’t exceed $100 for this batch, is the current quality level sufficient?

Example (June 05 Q1):

It is estimated that 4%4\% of people have green eyes. In a random sample of size nn, the expected number of people with green eyes is 55.

  1. Calculate the value of nn.

The expected number of people with green eyes in a second random sample is 3.

  1. Find the standard deviation of the number of people with green eyes in this second sample.

Example (WST02/01/Jan17/1):

The random variable XX has the binomial distribution B(20,0.45)B(20, 0.45).

  1. Find P(X=8)P(X= 8).
  2. Find the probability that XX lies within one standard deviation of its mean.

Example: Chikungunya Fever Testing

The AL High school of Guangdong Country Garden School decides to implement a Chikungunya fever testing for all 1000 students. At the time of testing, the prevalence of Chikungunya fever is 0.5% (i.e., 0.005).

Test Characteristics:

  • Sensitivity: 95% — If a student has Chikungunya fever, the test correctly identifies them 95% of the time
  • Specificity: 98% — If a student doesn’t have Chikungunya fever, the test correctly identifies them as negative 98% of the time
  1. Let XX be the number of students who actually have Chikungunya fever. What distribution does XX follow? Calculate the expected number of infected students.

Given that the number of infected students is 6:

  1. Among the infected students, let YY be the number who test positive (true positives). What distribution does YY follow? Calculate P(Y5)P(Y \geq 5).
  2. Among the non-infected students, let ZZ be the number who test positive (false positives). What distribution does ZZ follow? Calculate the expected number of false positives.
  3. The Paradox: If a randomly selected student tests positive, what is the probability they actually have Chikungunya fever? Use your previous results to explain why such seemingly surprising results can occur.
  4. The school decides to retest all positive cases with a second, independent test (same sensitivity and specificity). If a student tests positive on both tests, what is the probability they actually have Chikungunya fever?

Challenge Tasks: Probability Generating Function

Section titled “Challenge Tasks: Probability Generating Function”

Just as Pascal and Fermat used exhaustive enumeration, and Bernoulli provided us with a powerful formula, we now seek the most elegant and unified expression: the Probability Generating Function (PGF). This remarkable tool can ‘generate’ all probabilities, expectations, and variances from a single function, like Gauss discovered with the binomial expansion.