What exactly is a "rare event"? (Poisson point process)

Livio Arshavin Leiva · Apr 24, 2018

These days I've been reading in the internet about the Poisson Distribution because that was a concept I couldn't manage to understand completely when I studied it, so since then I've been always quite curious about Poisson processes, and how there are a lot of natural phenomena (mostly the random ones) that can be very accurately described modeling them as Poisson processes. One of those multiple examples were the number of goals in a world cup football match, and this was very interesting to me because I love football and being able to relate it with probability and statistics in a formal way seemed to be fun. Then I started searching about the facts that permit modeling the goals as a poisson process. For example, one could plot the histogram of frequencies of certain number of events in a fixed time and compare with a Poisson distribution and with goals the fitting is very good. Also I saw, for example, that the time intervals between goals followed, as expected from a Poisson process, an exponential distribution. Not exactly exponential because right after a goal is scored there is a time where both teams in same way have "to digest" the goal, so for very short times (less than 5 minutes) the exponential fit was not good but after that very good. In the other hand, the distribution of the quantity of goals during a match (I mean, dividing the 90 minutes in subintervals of about 10 minutes and counting the total goals in each interval), expected as constant in a Poisson process, was approximately constant except for the "psychological" fact that in the beginning of the match were too few goals (both teams has to "settle on the field" first) and the late goals were too many. (Obviously as a result of taking risks at the end when you're losing, either for better or worse).
What I want to say is, there were some "indicators" that it could be a Poisson process, but the most important, since it is used to derive the Poisson distribution from a binomial distribution, is the fact that the events has to be "rare". In the demonstration, the limit applied to the binomial distribution consists in tend the "numer of trials" ##n## to infinity but at the same time to maintain constant the average number of successes ##\Lambda=np##, with ##p## the probability of success in each trial. That is, ##p## tends to 0. So here one could take the notion of what a rare event is. But there are a lot of cases where there is nothing that one can understand as a "trial". For example in a radioactive decay, there are no such trials. One can not say: "Wait a minute, this atom here maybe is trying to decay now! Let me check it. Oh no, it missed, it is still Uranium." So one can not compute a probability of success ##p##. One also can not design an experiment to generate an estimator of the quantity ##p##. So, finally, and I must apologize for the extension, my question is: How can I determinate if certain events that are happening are really a Poisson process? I know the main condition is that the events must occur independently, but where is the relation of that with the "rare" issue? For example in Drude's theory for metal conductivity, it works to model the electron collisions as a Poisson process, but occur millions of them a second, they are happening all the time. So they're not rare. It would seem that the most important fact is the "randomness" and not being rare after all. I mean, Poisson processes mathematically are points in a certain domain, for them to be frequent or rare is just a scale factor. But in the reality, maybe there is some kind of causality argument? Expecting that very distanced in time events should dissipate the influence of one on another? I mean, "very distanced in time" compared with the characteristics times of the processes generation mechanisms...
Just wondering, if somebody knows the real justification of the thing...
Thank you very much in advance.

Stephen Tashi · Apr 24, 2018

Livio Arshavin Leiva said:

my question is: How can I determinate if certain events that are happening are really a Poisson process?

That question doesn't match the title of your post, but let's consider it.

The question has two possible interpretations.
1) How can I examine data from phenomena and determine if it is generated by a Poisson process?

2) How can I use a physical model of the details of a phenomena to prove the phenomena is generated by a Poisson process?

As to 1) there are various ways of quantitatively measuring the fit of a probability distribution to data. None of them give an answer in the sense of mathematically proving "Does fit" or proving "Doesn't fit".

From what you wrote, I assume you are interested in 2).

The fact that the model for a physical phenomena is the mathematical limit of a sequence of process doesn't force Nature to actually implement each process in the sequence. For example, in classical physics when we define the velocity (in 1 dimension) of an object "at" time t, we define it as ##v(t) = lim_{h \rightarrow 0} (x(t+h)) - x(t))/h## where ##x(t)## is the position of the object. This doesn't imply that Nature actually implements a phenomena where an object as one position at time t and then suddenly jumps to a new position at time t + h for a given finite h. You can, however, imagine an experiment where we only measure the position of an object at time t and time t+h and make no measurements in between those times.

Likewise, in a model for radioactive decay, Nature doesn't necessarily implement a process involving a large set of independent trials. However, we may imagine an experiment where the only data recorded is whether a decay occurs in the time intervals [0,h], [h,2h],[2h,3h],... etc. If the time between decays is exponentially distributed, the phenomena of the experimental outcomes can be approximated by a Poisson distribution.

The assumption that the time between events is exponentially distributed is very specific. Assumption that are less specific can be made and analyzed E.g. Assume that the probability of a decay in each interval is some constant (which may depend on h) independently of what happens in previous intervals.. To get a result from taking a limit as h approaches 0, we may add some assumption about the rarity of events. For example we can assume p/h approaches some constant value.

The assumption that events in process are "rare" isn't sufficient, by itself, to demonstrate that the count of events follows a Poisson process or can be well approximated by a Poisson process.

StoneTemplePython · Apr 24, 2018

Stephen Tashi said:

If the time between decays is exponentially distributed, the phenomena of the experimental outcomes can be approximated by a Poisson distribution.

The assumption that the time between events is exponentially distributed is very specific.

This is it.

The key is to focus on the inter-arrival distribution. If it's continuous and memoryless, then you have an exponential distribution for inter-arrivals, and the Poisson process counts it for you.

- - - -

Livio Arshavin Leiva said:

But in the reality, maybe there is some kind of causality argument? Expecting that very distanced in time events should dissipate the influence of one on another? I mean, "very distanced in time" compared with the characteristics times of the processes generation mechanisms..

For a real world example, consider earthquake modeling (the main ones, not the aftershocks). Last I read these are modeled by a continuous memoryless process -- i.e. exponential distribution for inter-arrival times. It isn't because people think earthquakes are 'truly memoryless', its just that more complicated models have repeatedly done worse (or at least no better) in prediction quality.

Livio Arshavin Leiva · Apr 24, 2018

StoneTemplePython said:

This is it.

The key is to focus on the inter-arrival distribution. If it's continuous and memoryless, then you have an exponential distribution for inter-arrivals, and the Poisson process counts it for you.

- - - -
For a real world example, consider earthquake modeling (the main ones, not the aftershocks). Last I read these are modeled by a continuous memoryless process -- i.e. exponential distribution for inter-arrival times. It isn't because people think earthquakes are 'truly memoryless', its just that more complicated models have repeatedly done worse (or at least no better) in prediction quality.

Thank you for your answers. So you mean that a memoryless continuous distribution (the exponential distribution is the only one as far as I know) for the inter-arrival times implies that those events are a Poisson process? I'm not sure. A Poisson process has an exponential distribution for the inter-arrival times, but the reciprocate is not necessarily valid, isn't it?

Livio Arshavin Leiva · Apr 24, 2018

But what I wanted to emphasize in the thread is the fact that the "rarity" of a certain phenomena is not well defined. Or at least it is not well defined when you can not talk about trials and successes. I mean, one could set a certain tolerance for the exponential approximation vs the binomial distribution itself, for example 1% and so you can determine what is the maximum value of ##p## for this to be valid. But in most of the cases where some events in the reality are modeled as a Poisson process, the "rarity" can not be defined. So my conclusion is that in those cases the key factor is the non correlation between events, and this is an stronger condition than the "rarity" of the events. So I understand that the randomness is the important issue, and the rarity is just a friendly way to have a "more intuitive" notion of what non-correlation really means.

Livio Arshavin Leiva · Apr 24, 2018

Stephen Tashi said:

The fact that the model for a physical phenomena is the mathematical limit of a sequence of process doesn't force Nature to actually implement each process in the sequence.

This is in some way what I was trying to figure out. Seems to be true. Poisson distribution may be a limit of the binomial distribution but it can be applied to a wider variety of cases even apart from boolean random variables. And I think, maybe there is another way to obtain the formula considering only the non-correlation between events fact and without having to refer to a probability of succes, just an average rate ##\Lambda## that is in fact what finally appears in the distribution formula...
Thank you for your answer.

StoneTemplePython · Apr 24, 2018

Livio Arshavin Leiva said:

Thank you for your answers. So you mean that a memoryless continuous distribution (the exponential distribution is the only one as far as I know) for the inter-arrival times implies that those events are a Poisson process? I'm not sure. A Poisson process has an exponential distribution for the inter-arrival times, but the reciprocate is not necessarily valid, isn't it?

A lot of what you're saying is close.

Livio Arshavin Leiva said:

But what I wanted to emphasize in the thread is the fact that the "rarity" of a certain phenomena is not well defined. Or at least it is not well defined when you can not talk about trials and successes.

I wouldn't fixate on the rarity -- you can always tweak the ##\lambda##. But on account of being in continuous time the Poisson has zero probability of an arrival at any specific time ##t##.

Livio Arshavin Leiva said:

the exponential approximation vs the binomial distribution itself

This doesn't make a whole lot of sense. The inter-arrival distribution is exponential in continuous time, and in discrete time it's a geometric distribution. The Binomial distribution is intimately tied with the latter, but not at all the same thing. You should be able to work through a coin tossing example to appreciate this.

Livio Arshavin Leiva said:

But in most of the cases where some events in the reality are modeled as a Poisson process, the "rarity" can not be defined. So my conclusion is that in those cases the key factor is the non correlation between events, and this is an stronger condition than the "rarity" of the events. So I understand that the randomness is the important issue, and the rarity is just a friendly way to have a "more intuitive" notion of what non-correlation really means.

Memorylessness isn't the same thing same thing as uncorrelated. Some deeper understanding is needed to appreciate these points.

If you start modelling some stochastic processes, the meanings become clear. The Poisson (Bernouli) is the continuous (discrete) Counting Process with an idealized interr-arrival distribution that has this great property called memorylessness.

Livio Arshavin Leiva · Apr 24, 2018

StoneTemplePython said:

This doesn't make a whole lot of sense. The inter-arrival distribution is exponential in continuous time, and in discrete time it's a geometric distribution. The Binomial distribution is intimately tied with the latter, but not at all the same thing. You should be able to work through a coin tossing example to appreciate this.

Sorry, what I meant was the Poisson distribution form (that is exponential in ##\Lambda##): ##\Lambda^k e^{-\Lambda}/k!## (this is discrete variable) as an approximation of the real binomial distribution. Here one could compare the binomial term for each ##k## occurrences vs the poisson distribution. I know that the exponential distribution is the continuous distribution of a continuous variable that is the time between arrivals. Maybe I referring to the poisson distribution itself as an exponential because of the exponential function inside it was confusing, it was not my intention to mix discrete poisson with continuous exponential. My mistake.

StoneTemplePython said:

Memorylessness isn't the same thing same thing as uncorrelated. Some deeper understanding is needed to appreciate these points.

So you mean that having random non correlated points in a certain domain is not enough to determine those points are a Poisson process?

StoneTemplePython · Apr 24, 2018

Livio Arshavin Leiva said:

So you mean that having random non correlated points in a certain domain is not enough to determine those points are a Poisson process?

(a) Suppose you have a counting process that counts arrivals, where the arrival process starts anew the instant an arrival occurs. The distribution for the each arrival time period is exponential. The inter-arrival times you record are a bunch of independent variables, and hence zero correlation. Your counting process is called a Poisson process.

(b) Now suppose there is counting process B, exactly the same as the above, except you rely on me to tell you when the arrivals occur, and I secretly hide odd arrivals and only tell you when even numbered arrivals occur. But you don't know this -- you only hear me say "arrival" and you record the time. There is still zero correlation between the elapsed times for each "arrival" period that you record (why?). But the this is not a Poisson process anymore.

It's a bit like talking about animals that are non-elephants... there's lots and lots of different examples.

- - - -
edit: What I'm hinting at heavily is there is a large collection of arrival processes who have IID (independent identically distributed) inter-arrival times. This collection is called "renewal processes". The simplest forms of renewal processes are memoryless -- i.e. the Poisson and Bernouli.

Livio Arshavin Leiva · Apr 24, 2018

StoneTemplePython said:

(a) Suppose you have a process that counts arrivals, where the process starts anew the a instant an arrival occurs. The distribution for the each arrival time period is exponential. The inter-arrival times you record are a bunch of independent variables, and hence zero correlation. Your counting process is called a Poisson process.

(b) Now suppose there is counting process B, exactly the same as the above, except you rely on me to tell you when the arrivals occur, and I secretly hide odd arrivals and only tell you when even numbered arrivals occur. But you don't know this -- you only hear me say "arrival" and you record the time. There is still zero correlation between the elapsed times for each "arrival" period that you record (why?). But the this is not a Poisson process anymore.

It's a bit like talking about animals that are non-elephants... there's lots and lots of different examples.

Sorry for my ignorance but I don't get it... Why counting the even numbered arrivals wouldn't be a new poisson process constructed from the first one?

StoneTemplePython · Apr 24, 2018

Livio Arshavin Leiva said:

Sorry for my ignorance but I don't get it... Why counting the even numbered arrivals wouldn't be a new poisson process constructed from the first one?

The Poisson process is the only continuous time process that is memoryless -- and it has exponentially distributed inter-arrival times aka inter-arrival times that have a distribution that's Erlang of order 1. The counting process in (b) has an inter-"arrival" distribution that is Erlang of order 2 - - i.e. a convolution of two exponential distributions. That is not memoryless and hence is not a Poisson.

For better or worse, you need to get your hands dirty in the math to really make sense of this stuff.

Livio Arshavin Leiva · Apr 25, 2018

StoneTemplePython said:

The Poisson process is the only continuous time distribution that is memoryless -- and it has exponentially distributed inter-arrival times aka inter-arrival times that have a distribution that's Erlang of order 1. What you are counting in (b) has an inter-"arrival" distribution that is Erlang of order 2 - - i.e. a convolution of two exponential distributions. That is not memoryless and hence is not a Poisson.

For better or worse, you need to get your hands dirty in the math to really make sense of this stuff.

I see... the time interval between even numbered events is the sum of two exponentially distributed variables and so its distribution is the convolution of two exponential distributions. It's quite strange, maybe not very "intuitive", but then the fact of observing an exponential distribution for the inter-arrival times is a much better indicator than the no-correlation between events, when trying to determine if certain events really are a poisson process. Isn't it?

FactChecker · Apr 25, 2018

The official properties of an ideal Poisson process do not require any reference to binomial or "rare".

That being said, the limit of a binomial distribution as the probability of each success becomes smaller is a Poisson distribution. If a binomial distribution is to be considered approximately a Poisson process, the success rate of the binomial must be small. But that is just an approximation. There are many examples of binomial processes that can be very well approximated as a Poisson process. At the moment, I can not think of a Poisson example in the real world that is not actually a binomial distribution of a rare occurance. (Here, "rare" means that there is a very small fraction of successes. There may be an enormous number of trials per unit time, so there may actually be a lot of successes in a time interval.)

What exactly is a "rare event"? (Poisson point process)

1. What is a rare event?

2. What is a Poisson point process?

3. How are rare events and Poisson point process related?

4. Can other distributions be used to model rare events?

5. What are some real-world examples of rare events that can be modeled using a Poisson point process?

Similar threads

Hot Threads

Recent Insights