Obtaining samples of an RV with a given CDF

  • Thread starter chroot
  • Start date
  • Tags
    Cdf
In summary, Warren found that interpolating the cdf with a spline yielded poor results due to slowness and the inability to sample the cdf when it was horizontal. He found a faster interpolation method but it required precomputing the spline. There are other interpolation routines available on the MATLAB file exchange that might help.
  • #1
chroot
Staff Emeritus
Science Advisor
Gold Member
10,295
41
Hey guys,

I know the cdf of a random process, and I want to obtain samples of it computationally. (In MATLAB, specifically.) How can I go about doing this?

The naive approach I took was to put the cdf into a variable, and then use interpolation of the inverse function. More specifically, say the distribution was this:

Code:
angles = [ 0 10 20 30 40 50 60 70 80 90 ];
cdf = [ 0 0.3 0.4 0.6 0.8 0.88 0.92 0.94 0.99 1.0 ];

To find a sample of this random process, I do something like this:

Code:
angle = interp1(cdf, angles, rand, 'spline');

This seems to work reasonably well, but it has some flaws.

First, it's pretty slow, and is a bottleneck in my program. Yes, linear interpolation is a little faster, but still much slower than what I'd like. I tried a quick 'qinterp1' function I found at the MATLAB file exchange, but could not get it to behave correctly.

Second, it will not work if the cdf is ever horizontal. In other words, the cdf [ 0 0.5 1.0 1.0 ... ] won't work, because interp1 can only work when the x values are distinct.

Is there some better way to do this? I remember seeing a mathematical explanation of how to get samples of a random process with a known cdf in a statistics class, but I suspect it was different than what I'm doing. If I'm doing it the right way conceptually, is there a way to speed up the computation? I have to do this many billions of times, so even just doing some precomputation would be enormously beneficial.

- Warren
 
Physics news on Phys.org
  • #2
If you're simulating from a discrete distribution then there are other specialized routines on the file exchange that might help.

Also if the distributions don't change at every step then you could vectorize to reduce overhead, e.g. with "rand(1e6,1)" instead of "rand".
 
  • #3
Actually, the cdf is continuous, but is not described analytically. Instead I took samples of the cdf.

- Warren
 
  • #4
Pick a random number, r, uniformly from (0,1) and then perform a binary search in your CDF array for the highest index whose entry is <= r.
 
  • #5
chroot said:
Actually, the cdf is continuous, but is not described analytically. Instead I took samples of the cdf.

Ok. Looks you're using equally spaced x points. Instead try choosing the x values so that the cdf points are more equally spaced (more work, but this is only done in the preprocessing). At the places where the cdf is horizontal choose the single x value you think is most appropriate, and add more cdf points nearby to make the spline more accurate.

Also with each call of interp1 there would be a lot of overhead for computing the spline. You might be able to speed things up significantly by precomputing it as described in the documentation for interp1.

In general cdf/quantile interpolation can be problematic though, and extra care should be taken with the endpoints especially if the tails are infinite.

Hope this helps.
 

Related to Obtaining samples of an RV with a given CDF

1. How do you obtain samples of an RV with a given CDF?

To obtain samples of a random variable with a given cumulative distribution function (CDF), you can use a variety of methods such as inverse transform sampling, acceptance-rejection sampling, and Markov chain Monte Carlo methods.

2. What is inverse transform sampling?

Inverse transform sampling is a method for generating random samples from a given probability distribution by using the inverse of its cumulative distribution function (CDF).

3. How does acceptance-rejection sampling work?

Acceptance-rejection sampling is a method for generating random samples from a given probability distribution by using a proposal distribution and accepting or rejecting samples based on a comparison of their probability values.

4. What is Markov chain Monte Carlo (MCMC) sampling?

MCMC sampling is a general method for generating random samples from a given probability distribution by constructing a Markov chain that has the desired distribution as its equilibrium distribution.

5. Can you explain the importance of obtaining samples from a given CDF?

Obtaining samples from a given CDF is important in many scientific fields, as it allows us to study and analyze the behavior of random variables and make predictions about future outcomes. This is particularly useful in fields such as statistics, economics, and physics.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Calculus and Beyond Homework Help
Replies
4
Views
9K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
796
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
967
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Electrical Engineering
Replies
2
Views
1K
Back
Top