Approximation of a process by independent draws

In summary, the original poster raised some interesting questions about a simple field test of a system that detects a target. They proposed modeling the distribution of detection ranges as a probability generating function.
  • #1
Stephen Tashi
Science Advisor
7,861
1,599
Another thread in this section (https://www.physicsforums.com/showthread.php?t=619945) raises some interesting questions. I can't figure what the original poster there is asking, so I think it best to put what I'm asking in a new thread.

A simple "field test" of a system that detects a target would be to position the target (such as an aircraft) at a great distance from the detector (such as a rader) and then move the target closer in a straight line. Repeating such a test and recording the distance at which detection first takes place gives a distribution of ranges. A typical analysis of the results is to fit a curve F(x) to the cumulative histogram of the ranges. I've often seen the curve F(x) mis-named as "The Probability Of First Detection Vs Range". It actually is "The Cumulative Distribution Of Range At Which First Detection Occurs".

Suppose we wish to write a stochastic simulation of the above test that depicts events as the test progresses, We divide the total time of the test up into equal intervals and these define the "steps" of our simulation. We wish to simulate the first detection by making a independent random draw from a benoulli random variable at each step until a detection occurs. (i.e. this variable represents "first detection occurs" or "first detection does not occur".) In general, the bernoulli random variables will have different distributions at different steps. We want the distribution of first detection ranges that we get from the simulation to match the curve F(x). How do we solve for the distributions of the appropriate bernoulli random variables?

In the codes of many simulations and in many technical reports, I have seen the following done:
At step N, find the distance x between the detector and the target and let the probability of a first detection at that step be F(x). ( I have also seen people use F'(x)). This is a nonsensical model. If you divide a given interval up into even finer intervals and apply it, you increase the probability of detection within that interval since you do more independent random draws.

So what ought to be done?
 
Physics news on Phys.org
  • #2
I don't know if I got this right, but if you have a collection of Bernoulli random variables each with different probabilities depending on what happens in past trials, then I think you can model this distribution of the collection of bernoulli variables in terms of a probability generating function.

The generating function will have to take into account how the probabilities of later trials are affected by prior ones and it will probably be complicated.

The only thing is that a PGF assumes independence between each distribution, but it seems that each distribution will depend only on the value of the realization of the last trial.

What you could do is use a variant of the PGF to basically 'entangle' realizations of past results to affect the probability density functions of the next random variable and then take this and decompose it into a PGF.

For example let's say X(1) = [0,1] with [1/2,1/2] probability and X(2) = [0,1] with [1,0] probability if X(1) = 1 (detected) and otherwise X(2) = [0,1] with [1/4,3/4] probability.

Then we can use X(2) = (1 - X(1))Y where Y is distribution [0,1] with [1/4,3/4].

Doing this kind of thing for successive variables allows an expansion and a decomposition which can be used to generate a PGF function.

There might be problems with this, so I'm interested on any feedback.
 
  • #3
chiro said:
I don't know if I got this right, but if you have a collection of Bernoulli random variables each with different probabilities depending on what happens in past trials, then I think you can model this distribution of the collection of bernoulli variables in terms of a probability generating function.

I don't understand how to relate the distribution of detection ranges F'(x) to a generating function.

I'll assume "trials" means "steps" of the simulation = trials of the benoulli random variables.

Suppose at the [itex]n[/itex]th step of the simulation, we are at elapsted time [itex] t_n [/itex] and distance [itex] x_n [/itex] and let [itex] p_n [/itex] be the (as yet unknown) probability that first detection occurring in that step given it has not occurred on previous steps.

Assume the target is moving straight toward the detector, not back and forth or side to side or standing still.

The (unconditional) probability [itex] D_n [/itex] that the first detection happens on step [itex] n [/itex] is
[itex] D_n = (1-p_1)(1-p_2)...(1-p_{n-1}) p_n [/itex]

So I'd think that [itex] D_n [/itex] must be approximately equal to [itex] F(x_n) - F(x_{n-1}) = \int_{x_{n-1}}^{x_n} F'(x) dx [/itex], which is the fraction of detections predicted to happen between ranges [itex] x_{n} [/itex] and [itex] x_{n-1} [/itex].

Can this line of reasoning lead to a generating function?
 
  • #4
I think I've actually mis-interpreted the problem, but you've clarified it with the latex in the above post.

I think the thing that's important for this example is separating whether something exists at a particular point (or distance) vs the actual probability of detecting it. Both refer to different things.

So we first have a distribution that something is at a particular distance. This is probably best to represented using a markovian distribution where the transition only happens with neighbouring cells. If the object is assuming to be constantly moving we can incorporate direction into the markovian model.

Then along-side this you have a model for the actual detection. The detection itself is different because its not going to always detect something even if it actually exists. If the detection always takes place, then there is no need to have this distribution since P(Detection) = 1.

So in your model you have given a model of detection, but in terms of an object having some sort of motion, the distribution of detection should actually be more dynamic especially if you utilize a detection scheme that makes use of the distribution of distance as well as the results of past detections.

For example if you get a detection, then your detection distribution will optimize itself to consider positions that have a local relation to that detection. There's no point for example checking a point 10 km's away from a detection that was registered a moment before.

I have a few ideas of how you would design the detection scheme based on geometry and markovian methods that look kind of like a queue distribution in that you can only add or subtract a member of the queue, except the number of people or things relates to a geometric interpretation involving a kind of distance.
 
  • #5
To make a versatile model of target detection, you would have to represent cases where the observer target distance could change (or not change) in a variety of ways. But in the situation I'm considering, the motion of the target toward the detector is deterministic. I'm willing to assume that the target starts at the same initial distance on each test and moves toward the detector with the same distance-vs-time function on each test. (i.e. Assume the [itex] (t_i, x_i) [/itex] data are known and the same in all replications of the test.

The only stochastic aspect of the test is the range at which the (first) detection happens. I'm assuming that the distribution of this range (over the population of tests) is known and has cumulative distribution [itex] F(x) [/itex].

The problem is to determine the [itex] p_j [/itex] that will produce a distribution of first detection ranges that is consistend with [itex] F(x) [/itex]. (An interesing theoretical question is whether discrete simulations using steps approach a continuous random process as the distance intervals between the steps approaches zero. And is there some continuous function [itex] p(x) [/itex] which is, in some sense, the limit of the [itex] p_j [/itex]? )
 
  • #6
So in your model, do you assume then that the detection at each "cell" is independent to every other point, or is dependent on the realizations of results given by past cells?

It seems that the initial position and trajectory in your model is deterministic, but that the detection probability over some interval is given by some distribution f(x).

The caveat though is the nature of the detection device which I hinted at in an earlier post. You can have quite a variety of detection methods and algorithms based on the uncertainty properties present in the actual physical detection.

It's these uncertainties when quantified in the physical detection process including its error that gives rise to the use of specific protocols. Think of it as the analog of constructing error correcting codes for communication channels with very specific noise properties: the code itself in the detector represents the detection protocol.

So the first thing to do is to start out with some error properties of physical detection. You could break it up into cells, do fancy geometry and whatever, but the basic model is to say given some cell (or an interval corresponding to continuous-space), then what is the probability of detection in that cell in the physical sense if we just want to relay a signal and see if it for lack of terminology 'bounces-pack' (kind of like a ping argument in network transmission).

You then construct a detection protocol that will actually send out signals in a very specific way as to gaurantee some probability given constraints for the trajectory, geometry, and for the actual detection device just like you do for a communications system with ECC's.

In this context, it's not only the physical detection itself, but the protocol that determines the final distribution.
 
  • #7
chiro said:
So the first thing to do is to start out with some error properties of physical detection.

That's good advice for creating a useful model of detection, but in this particular thread I have a more theoretical outlook. Having retired, I am (happily) not thinking about any realistic models of detection for particular systems. I'm merely recalling all the horribly wrong analysis and simulation code that I have seen, which were done based on tests like I have described. I'm wondering how hard it would be to come up with simple model for a moment-by-moment simulation that gives results that are consistent with a given range distribution F(x).

You could break it up into cells, do fancy geometry and whatever, but the basic model is to say given some cell (or an interval corresponding to continuous-space), then what is the probability of detection in that cell in the physical sense if we just want to relay a signal and see if it for lack of terminology 'bounces-pack' (kind of like a ping argument in network transmission).

The natural "cells"in this problem are simply the intervals of distance between the target and the detector. There are simple "physical models of detection that give the [itex] p_i [/itex] in terms of physical parameters like the radar cross section of the target, the pulse rate of the radar, it's signal strength at the given range, etc. However, if the task is to duplicate F(x) then one must solve the problem of finding some physical parameters that produce results that are consistent with F(x). Is that the approach you have in mind? That doesn't really answer the mathematical question I'm asking, namely: Does F(x) (alone) determine a unique set of [itex] p_i [/itex] in the very limited test scenario that I described?
 
  • #8
If you have an empirical CDF for the range at first detection, F(x), how about simply drawing a Uniform(0,1) pseudo-random number u, and then solve F(x) = u for x? Then x is your simulated range. You would probably need to use numerical methods to solve the equation since you don't have a nice formula for F.
 
  • #9
awkward said:
If you have an empirical CDF for the range at first detection, F(x), how about simply drawing a Uniform(0,1) pseudo-random number u, and then solve F(x) = u for x?

That is not relevant to the question being asked. The question is how to implement a simulation that proceeds in a series of steps and makes an independent random draw of a random variable on each step - hence the title of the thread.
 
  • #10
So the CDF is sort-of backwards: F(0) = 1, F(∞) = 0.
If there's a step width δx at range x, we want the probability of triggering there (and no sooner) to be -F'(x).δx (the area under the PDF for that step).
But (if we've modeled it correctly) there's a probability F(x) that we don't get that far in the simulation, so we have amplify the trigger probability to -F'(x).δx /(1-F(x)).
 
Last edited:
  • #11
haruspex said:
So the CDF is sort-of backwards: F(0) = 1, F(∞) = 0.
For a mathematical discussion, it might be simpler to let F(x) be a non-backwards CDF. You are correct that my remark about F(x) being mis-interpreted as a probability of detection vs range is only plausible for a backwards CDF - one that is near 1.0 for short ranges. We can discuss the problem using either convention.

If there's a step width δx at range x, we want the probability of triggering there (and no sooner) to be -F'(x).δx (the area under the PDF for that step).
But (if we've modeled it correctly) there's a probability F(x) that we don't get that far in the simulation, so we have amplify the trigger probability to -F'(x).δx /(1-F(x)).

As I understand that reasoning:
[itex] F(x) = [/itex] pr( first detection occurs at range [itex] \ge x [/itex]
pr( first detection occurs in the interval [itex] [x, x + \delta x] [/itex] given no first detection at any range [itex] \ge x [/itex] )
= pr( first detection occurs in that interval and no first detection at range [itex] \ge x [/itex] )/ pr(no previous first detection at range [itex] \ge x [/itex] )
= [itex] -F'(x) \delta x/ ( 1 - F(x)) [/itex]
 
  • #12
Stephen Tashi said:
As I understand that reasoning:
Yes, but on second thoughts it would be more practical to write it as (F(x-δx)-F(x))/(1-F(x)).
This gives an exact total probability of 1 however large the δx.
 
  • #13
haruspex said:
Yes, but on second thoughts it would be more practical to write it as (F(x-δx)-F(x))/(1-F(x)).
This gives an exact total probability of 1 however large the δx.

I wonder if there is a way to give mathematical meaning to the phrase "A random process implemented by a random draw to determine success or failure at each distance until a succes occurs". (i.e. take the limit (in some sense) as [itex] \delta \rightarrow 0 [/itex], but get something interesting instead of zero)

Thinking of the exponential distribution and thinking of the variable as time instead of distance, the associated Poisson distribution could be viewed a limiting distribution of binomials, which we can imagine to be making independent random draws as finer and finter intervals of time.

Suppose I think of F(x) as analagous to the "backwards" cumulative of an exponential

If we imagine F(x) approximated (better and better) as a sum of exponentials, can we define an associated process analagous to the Poission?

Or can I think of F(x) as being implemented as "a exponential with a continuously varying parameter [itex] \lambda(x) [/itex] ?
 
  • #14
Stephen Tashi said:
I wonder if there is a way to give mathematical meaning to the phrase "A random process implemented by a random draw to determine success or failure at each distance until a succes occurs". (i.e. take the limit (in some sense) as [itex] \delta \rightarrow 0 [/itex], but get something interesting instead of zero)
You could certainly give it a physical implementation, so a mathematical one follows. Imagine operating a Geiger counter with a continuously variable source (or shielding, say), with the r.v. being the time to first click. Is that the sort of process you have in mind?
 
  • #15
haruspex said:
You could certainly give it a physical implementation, so a mathematical one follows. Imagine operating a Geiger counter with a continuously variable source (or shielding, say), with the r.v. being the time to first click. Is that the sort of process you have in mind?

Yes, that's a physical version of what I have in mind and the mathematical question is whether one can take a somewhat arbitrary stochastic process and view it that way.

Perhaps the mathematical theory of reliabllity already does this in terms of the "faiure rate" function (http://en.wikipedia.org/wiki/Failure_rate), but the quality control orientation of that material isn't very inspiring.

I have in mind some analogy to the Wiener process, I visualize the Wiener process as the limit of processes, each of which involve independent random draws at small intervals (usually intervals of time) and the limit is taken by making both the size of the intervals and the effect of the random draws smaller and smaller.

In target detection simulations, the time to detect a target under "fixed conditions" is often modeled by a an exponential distribution with a parameter [itex] \lambda [/itex] that is a function of the conditions. If we take that model as the a basis for simulating what happens when a target moves toward a detector then the conditions would be changing continuously ( since range is one of the important variables).
 

Related to Approximation of a process by independent draws

1. What is the concept of "approximation of a process by independent draws"?

The concept of "approximation of a process by independent draws" refers to a statistical method used to estimate the behavior or outcome of a process by taking a large number of random samples or "draws" from the process. This is often used in situations where the process is too complex or time-consuming to analyze directly.

2. How does the approximation process work?

The approximation process involves taking a large number of random samples from the process and then analyzing the data to make predictions or estimations about the behavior or outcome of the process. These samples are assumed to be independent of each other, meaning that each sample does not affect the outcome of the others. By using a large number of independent samples, the approximation becomes more accurate.

3. What are some examples of processes that can be approximated by independent draws?

Some examples of processes that can be approximated by independent draws include stock market fluctuations, weather patterns, and election outcomes. These processes are all complex and difficult to predict directly, but by using the approximation method, scientists can make more accurate predictions based on a large number of independent samples.

4. What are the benefits of using the approximation method?

The approximation method allows scientists to estimate the behavior or outcome of complex processes without having to analyze them directly. This can save time and resources, as well as provide more accurate predictions. Additionally, the concept of independent draws allows for a more diverse range of outcomes to be considered, leading to a more comprehensive understanding of the process.

5. Are there any limitations or drawbacks to using the approximation method?

One limitation of the approximation method is that it relies on the assumption of independence between samples. In reality, some processes may have dependencies that can affect the accuracy of the approximation. Additionally, the quality of the approximation depends on the number of samples taken, so a larger number of samples may be needed for more accurate predictions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
599
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
986
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
708
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
Back
Top