Derivation of Poisson Distribution from Probability Theory

Spread the love


Poisson distribution makes possible to model phenomena in which we would normally use the binomial distribution, but parameter n is too large and calculations would be extremely complex; time intervals t (in which n is large and roughly constant) are considered instead of n.

The Poisson distribution is used in situations in which we know the mean value λ of “successes” of a Bernoulli random variable with a very high (and unknown) number n of observations, uniformly spread over a fixed and repeated time interval (for example days , weeks, hours, etc.).

For instance, the random variable might be the number of telephone calls per day received by an office, the number of days a school is closed due to snow in winter, etc. (Walpol et al, 2007, p. 161). For example, if we know the average number λ of telephone calls per day, but the number n of times someone could have made a call is unknown (a very high number), then Poisson distribution may be used to model that probability distribution.

In the derivation of Poisson distribution from the binomial distribution, the limit of the binomial distribution formula when n approaches infinity is calculated; this means that n must be a very large number for the Poisson distribution to model a certain phenomenon. This distribution also has the advantage of greatly simplifying calculations; in fact, if the binomial distribution was applied to cases in which the Poisson distribution should be used, the amount of calculations required would become extremely prohibitive; in many cases, n is unknown and therefore the binomial can’t be used.

Poisson distribution is also known as the distribution of rare events since it is used in situations in which the probability of success p of a binomial distribution is very low; in such cases, the number of successes x grows very slowly as n increases, and it would be better to consider very large values of n (otherwise, the probability density function would be concentrated on low values of x. Instead of time, space might be considered; formulas and the underlying reasoning are substantially the same (Walpol et al, 2007, p. 162).


It is required that all the random and repeated events n that occur in the time interval t are independent. Moreover, the number of random and repeated events n needs to be large enough to satisfy the hypothesis of infinite n and, in addition, the random and repeated events n have to be uniformly distributed in t.

How λ is defined

Poisson distribution is usually derived from the binomial distribution, but there’s also another way: it can also be easily derived from probability theory.

When Poisson distribution is derived from the binomial distribution, the average number of successes in a time interval t is defined as

When instead it is derived from probability theory, the average number of successes in a time interval t is defined as

In the first case, λ is the average number of successes in t, while in the second case λ is the average number of successes in t, divided by t. The formula of Poisson distribution’s probability density function (p.d.f.) will be different depending on the way we define λ.

Poisson distribution’s formula

When Poisson distribution is derived from the binomial distribution, it’s usually written as

When instead it’s derived from probability theory, it’s written as

The distribution and the basic reasoning and behavior are basically the same. The difference lies in the way λ is defined and the presence of t as an additional variable. We can write

The average value in this case is λt, rather than just λ. As mentioned, random events are uniformly distributed in the time interval t; therefore the number of random events Δn contained in Δt is equal to:

The mean value of a sub-interval Δt of t is therefore equal to

Let us now consider a so small interval that it contains just one of the n random events contained in t; from an infinitesimal point of view, Δt approaches zero, while n approaches infinity. For this interval, the probability that a success will occur is equal to

Since there cannot be more than one success in dt, we can write

We now calculate the probability that no success will occur in the time interval t+dt; let us recall that events are independent. By using the multiplication rule of probability, the probability of no event in t+dt is equal to the product of the probability of no event in t and the probability of no event in dt

This can be written as

By integrating the above differential equation and considering the boundary condition P(0,0)=1, we get

Let us calculate the probability of x events in t+dt by using both the addition rule and the multiplication rule of probability. The possible situations with outcomes n in t+dt are that n events occur in t and none occurs in dt, or that n-1 event occur in t and 1 event occurs in dt

After a few simple steps, we get

If both sides are multiplied by e to lambda times t, we get

The left-hand side is the derivative of the product of two functions

This differential equation can easily be solved by starting from x = 1 and reaching x-th element

Now, let us integrate and consider that the boundary condition is P(1,0)=0

Now we can obtain P(2, t) starting from the above result, and it can be generalized

The above result is thus proved by induction.