Skip to content

S2 Chapter 2: The Poisson Distribution

Preface: From Battlefield Statistics to Modern Modeling

Section titled “Preface: From Battlefield Statistics to Modern Modeling”

Welcome, mathematical explorers! Today we embark on a fascinating journey through time, where we’ll discover how the study of rare events — from deadly horse kicks in the Prussian army to cosmic phenomena — led to one of the most powerful tools in modern statistics: The Poisson Distribution.

Our story begins with a French mathematician whose name became synonymous with rare events, and whose work continues to illuminate patterns in everything from traffic flow to radioactive decay.

1. The Quest for Modeling Events in Continuous Time

Section titled “1. The Quest for Modeling Events in Continuous Time”

Imagine you’re the proud owner of a breakfast shop in Guangdong Country Garden School. Through careful observation over many weeks, you’ve discovered that on average, you sell exactly 10 baozi during the morning hour (7:00–8:00 AM).

First Instinct: “This Sounds Like Binomial!”

Section titled “First Instinct: “This Sounds Like Binomial!””

Your first thought might be: “I’ll use the binomial distribution!” But then you pause and ask yourself:

What exactly are my ‘trials’?

Let’s think about dividing the hour into smaller time intervals:

Time Interval Division

The Pattern:

  • As we divide the hour into smaller intervals, nn increases
  • The probability pp of selling a baozi in each tiny interval decreases
  • But their product npnp remains constant at 10 (our average sales rate)

The Mathematical Insight: We’re witnessing the transition from discrete binomial trials to a continuous process!

This isn’t just a mathematical curiosity — it has real implications for your baozi shop:

  • Customer arrivals are unpredictable: You can’t pinpoint exactly when each customer will arrive
  • Sales happen continuously: A customer could arrive at any moment during the hour
  • Rate is consistent: While individual sales are random, the average rate (10 per hour) is stable

This is exactly the situation where the Poisson distribution becomes our perfect tool!

Historical Context: From Battlefield to Breakfast Shop

Section titled “Historical Context: From Battlefield to Breakfast Shop”

Your baozi shop problem isn’t unique — mathematicians have been tackling similar “rare event” challenges for centuries. Let’s briefly explore how this powerful distribution was discovered:

Abraham de Moivre (1711): First discovered the mathematical pattern, though it remained largely unnoticed.

Siméon Denis Poisson (1837): Rediscovered and popularized the distribution in his work on legal statistics, modeling wrongful convictions.

Ladislaus Bortkiewicz (1898): Applied it to model Prussian cavalry deaths from horse kicks — rare, unpredictable events occurring at a steady average rate, just like your baozi sales!

2. The Mathematical Framework — Definition and Conditions

Section titled “2. The Mathematical Framework — Definition and Conditions”

Definition (Poisson Distribution): A discrete random variable XX follows a Poisson distribution with parameter λ>0\lambda > 0, denoted XPo(λ)X \sim \text{Po}(\lambda), if its probability mass function is:

P(X=x)=eλλxx!for x=0,1,2,3,P(X = x) = \frac{e^{-\lambda}\lambda^x}{x!} \quad \text{for } x = 0, 1, 2, 3, \ldots

Where:

  • λ\lambda represents the average rate of occurrence
  • e2.71828e \approx 2.71828 is Euler’s constant
  • x!x! is the factorial of xx

The Poisson distribution is not universal — it requires three fundamental conditions that directly determine the model’s accuracy:

2. Singly: In any infinitesimally small interval of time or space, at most one event can occur. The probability of two cars arriving at exactly the same microsecond is negligible.

3. Constant Rate: The average rate λ\lambda of occurrence remains constant over time. The rate doesn’t change between morning and afternoon (if we’re modeling a period with consistent conditions).

Now let’s return to our opening challenge and solve it using the Poisson distribution!

Theorem (Expectation and Variance of Poisson): For XPo(λ)X \sim \text{Po}(\lambda):

  • Expected value: E(X)=λE(X) = \lambda
  • Variance: Var(X)=λ\text{Var}(X) = \lambda
  • Standard deviation: σ=λ\sigma = \sqrt{\lambda}

Theorem (Additivity of Independent Poisson Variables): If XPo(λ)X \sim \text{Po}(\lambda) and YPo(μ)Y \sim \text{Po}(\mu) are independent, then:

Z=X+YPo(λ+μ)Z = X + Y \sim \text{Po}(\lambda + \mu)

Example (Real-World Additivity):

Scenario: A website receives an average of 15 visitors per hour from search engines (XPo(15)X \sim \text{Po}(15)) and 8 visitors per hour from social media (YPo(8)Y \sim \text{Po}(8)).

Total Traffic: The total number of visitors per hour follows Z=X+YPo(23)Z = X + Y \sim \text{Po}(23).

Interpretation: Combining independent Poisson processes creates another Poisson process with the sum of their rates.

Remark: The proof of properties above will be given in the Challenge Exercise where we will derive the probability generating function of the Poisson distribution and prove various properties of the Poisson distribution.

3. Guided Practice: Mastering Poisson Calculations

Section titled “3. Guided Practice: Mastering Poisson Calculations”

The tabulated value is P(Xx)P(X \leq x), where XX has a Poisson distribution with parameter λ\lambda.

0.51.01.52.02.53.03.54.04.55.0
x=x =00.60650.36790.22310.13530.08210.04980.03020.01830.01110.0067
10.90980.73580.55780.40600.28730.19910.13590.09160.06110.0404
20.98560.91970.80880.67670.54380.42320.32080.23810.17360.1247
30.99820.98100.93440.85710.75760.64720.53660.43350.34230.2650
40.99980.99630.98140.94730.89120.81530.72540.62880.53210.4405
51.00000.99940.99550.98340.95800.91610.85760.78510.70290.6160
61.00000.99990.99910.99550.98580.96650.93470.88930.83110.7622
71.00001.00000.99980.99890.99580.98810.97330.94890.91340.8666
81.00001.00001.00000.99980.99890.99620.99010.97860.95970.9319
91.00001.00001.00001.00000.99970.99890.99670.99190.98290.9682
101.00001.00001.00001.00000.99990.99970.99900.99720.99330.9863

Example (Cybersecurity):

Background: A cybersecurity team monitors attempted intrusions on their network. Historical data shows that intrusion attempts occur at an average rate of 2.5 per day, and these attempts appear to be independent and random.

Modeling Decision: Let XX = number of intrusion attempts per day. We model XPo(2.5)X \sim \text{Po}(2.5).

  1. What’s the probability of no intrusion attempts on a given day?
  2. What’s the probability of more than 55 attempts in one day?
  3. What’s the expected number of intrusion attempts in a week?
  4. The security team can handle up to 5 attempts per day effectively. Calculate the probability that in a week of 7 days, the intrusion attempts are handled effectively everyday.
  5. If they want to be adequately prepared 95% of the time, what should be their daily capacity?

Example (Quality Control in Manufacturing):

Context: A textile manufacturer produces large rolls of fabric. Quality control data shows that defects appear randomly at an average rate of 0.3 defects per square meter.

Question Series:

  1. In a 5 square meter section, what’s the probability of finding exactly 2 defects?
  2. What’s the probability that a 10 square meter section has no defects?
  3. If defects cost $15 each to repair, what’s the expected repair cost for a 20 square meter section?
  4. Two independent fabric sections of 3 square meters each are inspected. What’s the distribution of the total number of defects?

Example (June 07 Q3): An engineering company manufactures an electronic component. At the end of the manufacturing process, each component is checked to see if it is faulty. Faulty components are detected at a rate of 1.5 per hour.

  1. Suggest a suitable model for the number of faulty components detected per hour. (1)
  2. Describe, in the context of this question, two assumptions you have made in part (a) for this model to be suitable. (2)
  3. Find the probability of 2 faulty components being detected in a 1 hour period. (2)
  4. Find the probability of at least one faulty component being detected in a 3 hour period. (3)

Example (Jan 10 Q3): A robot is programmed to build cars on a production line. The robot breaks down at random at a rate of once every 20 hours.

  1. Find the probability that it will work continuously for 5 hours without a breakdown. (3)

Find the probability that, in an 8 hour period, 2. the robot will break down at least once, (3) 3. there are exactly 2 breakdowns. (2)

In a particular 8 hour period, the robot broke down twice. 4. Write down the probability that the robot will break down in the following 8 hour period. Give a reason for your answer. (2)

Example (Jan 09 Q1): A botanist is studying the distribution of daisies in a field. The field is divided into a number of equal sized squares. The mean number of daisies per square is assumed to be 3. The daisies are distributed randomly throughout the field.

Find the probability that, in a randomly chosen square there will be

  1. more than 2 daisies, (3)
  2. either 5 or 6 daisies. (2)

The botanist decides to count the number of daisies, xx, in each of 80 randomly selected squares within the field. The results are summarised below

x=295x2=1386\sum x = 295 \qquad \sum x^2 = 1386
  1. Calculate the mean and the variance of the number of daisies per square for the 80 squares. Give your answers to 2 decimal places. (3)
  2. Explain how the answers from part (c) support the choice of a Poisson distribution as a model. (1)
  3. Using your mean from part (c), estimate the probability that exactly 4 daisies will be found in a randomly selected square. (2)

Example (Jan 08 Q3):

  1. State two conditions under which a Poisson distribution is a suitable model to use in statistical work. (2)

The number of cars passing an observation point in a 10 minute interval is modelled by a Poisson distribution with mean 1.

  1. Find the probability that in a randomly chosen 60 minute period there will be
    • (i) exactly 4 cars passing the observation point,
    • (ii) at least 5 cars passing the observation point. (5)

The number of other vehicles, other than cars, passing the observation point in a 60 minute interval is modelled by a Poisson distribution with mean 12.

  1. Find the probability that exactly 1 vehicle, of any type, passes the observation point in a 10 minute period. (4)

(Optional) The Binomial–Poisson Connection

Section titled “(Optional) The Binomial–Poisson Connection”

The Poisson distribution emerges naturally as a limiting case of the binomial distribution under specific conditions.

Challenge Extension: Probability Generating Functions for Poisson

Section titled “Challenge Extension: Probability Generating Functions for Poisson”

Part I: Deriving the Poisson PGF — Two Approaches

Section titled “Part I: Deriving the Poisson PGF — Two Approaches”

Part II: Extracting Properties Using PGF Magic

Section titled “Part II: Extracting Properties Using PGF Magic”