S2 Chapter 3: Approximations and the Central Limit Theorem
S2 Statistics: Chapter 3 — Approximations and the Central Limit Theorem
Section titled “S2 Statistics: Chapter 3 — Approximations and the Central Limit Theorem”Preface: The Quest for Computational Simplicity
Section titled “Preface: The Quest for Computational Simplicity”Welcome, mathematical problem-solvers! Today we embark on a fascinating journey that bridges the gap between mathematical precision and practical computation. We’ll discover how some of history’s greatest mathematicians overcame seemingly insurmountable computational challenges through the elegant art of approximation.
Our story begins in an era before computers, when calculating even simple probabilities could take hours or days of tedious arithmetic. The question that drove mathematical innovation was simple yet profound: How can we make the impossible, possible?
1. The Computational Crisis: When Precision Becomes Impractical
Section titled “1. The Computational Crisis: When Precision Becomes Impractical”Setting the Stage: The 19th Century Dilemma
Section titled “Setting the Stage: The 19th Century Dilemma”Imagine you’re a 19th-century insurance actuary, tasked with calculating risk probabilities to set fair premiums. You need to compute probabilities from distributions like or .
2. Poisson Approximation: The Lightweight Solution
Section titled “2. Poisson Approximation: The Lightweight Solution”Recalling the Theoretical Foundation
Section titled “Recalling the Theoretical Foundation”From our study of the Poisson distribution, we know that:
Theorem (Poisson Limit of Binomial): As and while maintaining (constant), we have:
Practical Application Guidelines
Section titled “Practical Application Guidelines”Example 1 (Quality Control Application):
A factory produces 1000 components per day with a defect rate of 0.005. What’s the probability of exactly 5 defective components?
Exact Calculation: where
This is computationally intensive!
Poisson Approximation: Since is large, is small, and , we can use:
Much simpler to calculate!
Example 2 (Exercise):
In a certain region, of the population has lactose intolerance. A medical study randomly selects individuals from this population. Let represent the number of individuals WITHOUT lactose intolerance.
- Write down an expression for the exact value of
- Explain why the Poisson approximation is suitable for this problem.
- Using a Poisson approximation, estimate
Solution:
3. Normal Approximation: Breaking the Distribution Barriers
Section titled “3. Normal Approximation: Breaking the Distribution Barriers”The Discovery of Universal Convergence
Section titled “The Discovery of Universal Convergence”Normal Approximation to Binomial Distribution
Section titled “Normal Approximation to Binomial Distribution”Theorem (De Moivre-Laplace Theorem): If where is large and is not too close to 0 or 1, then:
where and .
Rule of Thumb: Use when and .
Visual Demonstration: Normal Approximation Quality
Section titled “Visual Demonstration: Normal Approximation Quality”Binomial Distribution B(50, 0.3) with Normal Overlay
Imagine a bar chart of B(50, 0.3) with a red normal curve N(15, 10.5) overlaid — the match is remarkably close.
Mean = 15, Variance = 10.5
Normal Approximation to Poisson Distribution
Section titled “Normal Approximation to Poisson Distribution”Just as the binomial distribution approaches normality, so does the Poisson distribution for large parameters.
Theorem (Normal Approximation to Poisson): If where is large (typically ), then:
Note the beautiful property: for Poisson distributions, the mean equals the variance!
Poisson Distribution Po(12) with Normal Overlay
Imagine a bar chart of Po(12) with a red normal curve N(12, 12) overlaid — again, a very close fit.
Mean = Variance = 12
4. Continuity Correction: The Bridge Between Discrete and Continuous
Section titled “4. Continuity Correction: The Bridge Between Discrete and Continuous”The Fundamental Challenge
Section titled “The Fundamental Challenge”When we approximate a discrete distribution with a continuous one, we face a conceptual problem:
Continuity Correction Rules
Section titled “Continuity Correction Rules”Theorem (Continuity Correction): When approximating a discrete distribution with a continuous distribution, use these transformations:
| Discrete | Continuous Approximation |
|---|---|
Example 3 (Continuity Correction in Practice):
A binomial random variable is approximated by . Find .
Without Continuity Correction: (meaningless!)
With Continuity Correction:
This gives a meaningful approximation that accounts for the discrete nature of the original distribution.
5. Guided Practice
Section titled “5. Guided Practice”Example 4:
For each scenario, determine the most appropriate approximation method:
- , find
- , find
- , find
Solution:
Example 5:
A confectionery company produces chocolate bars, and during a special promotion, they place a golden ticket in of the bars. A convenience store receives a shipment of chocolate bars.
-
- Write down a suitable distribution to model the number of bars containing golden tickets.
- State one assumption required for this model to be valid.
- Find the probability that exactly bars contain golden tickets.
- Using a normal approximation with continuity correction, estimate the probability that fewer than bars contain golden tickets.
- The store manager wants to be confident that at least customers will find golden tickets. Is this shipment size sufficient? Show your working.
Solution:
Homework Exercises
Section titled “Homework Exercises”Exercise 1 (WST02/01/Jan15/7):
A multiple choice examination paper has questions where .
Each question has answers of which only is correct. A pass on the paper is obtained by answering or more questions correctly.
The probability of obtaining a pass by randomly guessing the answer to each question should not exceed .
Use a normal approximation to work out the greatest number of questions that could be used.
Exercise 2 (WST02/01/Jan16/3):
Left-handed people make up of a population. A random sample of people is taken from this population. The discrete random variable represents the number of left-handed people in the sample.
-
- Write down an expression for the exact value of
- Evaluate your expression, giving your answer to 3 significant figures.
- Using a Poisson approximation, estimate
- Using a normal approximation, estimate
- Give a reason why the Poisson approximation is a more suitable estimate of
Exercise 3 (WST02/01/Jan17/3):
- State the condition under which the normal distribution may be used as an approximation to the Poisson distribution.
The number of reported first aid incidents per week at an airport terminal has a Poisson distribution with mean 3.5.
- Find the modal number of reported first aid incidents in a randomly selected week. Justify your answer.
The random variable represents the number of reported first aid incidents at this airport terminal in the next 2 weeks.
- Find
- Given that there were exactly 6 reported first aid incidents in a 2 week period, find the probability that exactly 4 were reported in the first week.
- Using a suitable approximation, find the probability that in the next 40 weeks there will be at least 120 reported first aid incidents.
Exercise 4 (WST02/01/June17/2):
Crispy-crisps produces packets of crisps. During a promotion, a prize is placed in of the packets. No more than prize is placed in any packet. A box contains packets of crisps.
-
- Write down a suitable distribution to model the number of prizes found in a box.
- Write down one assumption required for the model.
- Find the probability that in randomly selected boxes, only box contains exactly prize.
- Find the probability that a randomly selected box contains at least prizes.
Neha buys boxes of crisps.
- Using a normal approximation, find the probability that no more than of the boxes contain at least prizes.
6. The Central Limit Theorem: The Ultimate Foundation
Section titled “6. The Central Limit Theorem: The Ultimate Foundation”The Grand Unifying Theory
Section titled “The Grand Unifying Theory”All our approximation methods point to a deeper truth — one of the most important theorems in all of mathematics:
Theorem (Central Limit Theorem): Let be independent and identically distributed random variables with finite mean and variance .
As , the sum approaches a normal distribution:
Equivalently, the standardized sum:
Discovering CLT Through Dice: A Visual Journey
Section titled “Discovering CLT Through Dice: A Visual Journey”Let’s see the magic of CLT in action using the simple example of rolling dice.
Example 6 (Rolling Dice and Finding Normality):
Consider rolling a fair six-sided die. The outcome has:
- Uniform distribution: for
- Mean: , Variance:
Exercise: Calculate the Distribution for n = 2
Now let’s roll two independent dice and find the sum :
- Complete the table below to find all possible outcomes and their sums:
| 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | ||||||
| 3 | ||||||
| 4 | ||||||
| 5 | ||||||
| 6 |
- Count the frequency of each sum and complete the probability distribution:
| Sum | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Frequency | |||||||||||
- Calculate and using the properties of sums:
______
______
Key Observation: Even with just two dice, we can see the distribution shifting from uniform (flat) to triangular (peaked). This is the beginning of the journey toward the normal distribution!
The Visual Journey of CLT:
Imagine three histograms side by side:
- n = 1: Perfectly uniform — flat and rectangular, Mean = 3.5
- n = 2: Triangular shape — the peak emerges, Mean = 7.0
- n large: Beautiful bell curve — exactly the normal distribution, Mean = 3.5n
Connecting CLT to Our Earlier Work
Section titled “Connecting CLT to Our Earlier Work”Now we can understand why our approximation methods work so well:
Example 7 (Why Binomial Becomes Normal):
Recall that if , then can be written as:
where each is independent.
By CLT, as gets large:
This is exactly the De Moivre-Laplace theorem we used earlier!
Example 8 (Why Poisson Becomes Normal):
Similarly, if , we can express as the sum of many small independent Poisson variables.
For large , we can write as the sum of independent variables. By CLT:
This explains our normal approximation to Poisson distribution!
The Power of CLT: Real-World Applications
Section titled “The Power of CLT: Real-World Applications”Example 9 (CLT in Action: Quality Control):
A factory produces items where the final weight is affected by:
- Raw material variations
- Machine calibration drift
- Temperature fluctuations
- Operator differences
- Measurement errors
- … and many other small factors
Even if each individual factor has a completely different distribution, the total effect (sum of all factors) will be approximately normal by CLT.
This is why quality control charts always assume normal distributions!
Looking Forward: The Mathematical Foundation
Section titled “Looking Forward: The Mathematical Foundation”The Mathematical Beauty: The Central Limit Theorem reveals a fundamental harmony in randomness — no matter how chaotic individual components might be, their collective behavior gravitates toward the same universal pattern: the normal distribution serves as nature’s “attractor” for randomness.