Order Statistics: CDF Calculation for i.i.d. Random Variables

In summary: The last integral should be ##f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}##, but I don't know how to write it as a function of ##x_1##, ##x_2##, and ##x_3##.In summary, the conversation discusses the calculation of ##\text{Pr}\left[\sum_{i=1}^3Z_k\leq \eta\right]## where ##Z_k=X_k/Y_k## and ##X_k
  • #1
EngWiPy
1,368
61
Suppose I have the random variables ##Z_k=X_k/Y_k## with a PDF ##f_{Z_k}(z_k)## for ##k=1,\,2\,\ldots, K##, where ##\{X_k, Y_k\}## are i.i.d. random variables. I can find

[tex]\text{Pr}\left[\sum_{i=1}^3Z_k\leq \eta\right][/tex]

as

[tex]\int_{z_3=0}^{\eta}\int_{z_2=0}^{\eta-z_3}\int_{z_1=0}^{\eta-z_3-z_2}f_{Z_1}(z_1)f_{Z_2}(z_2)f_{Z_3}(z_3)\,dz_1dz_2dz_3[/tex]

Now suppose I arrange the random variables ##\{X_k\}_{k=1}^K## as

[tex]X_{(1)}\leq X_{(2)}\leq \cdots \leq X_{(K)}[/tex]

and define ##G_k=X_{(k)}/Y_{i_k}##, where ##i_k## is the index of the ##k^{\text{th}}## order statistic.

How can I find

[tex]\text{Pr}\left[\sum_{i=1}^3G_k\leq \eta\right][/tex]
 
Physics news on Phys.org
  • #2
EngWiPy said:
where ##\{X_k, Y_k\}## are i.i.d. random variables
This doesn't make it clear what is identically distributed to what, or what is independent of what.

The maximal interpretation is that all ##X_k## and all ##Y_k## have the same distribution, with PDF ##f_X##, and that the set of random variables ##\{X_1,...,X_K,Y_1,...,Y_K\}## is pairwise independent. Or it could mean that ##X_k## has the same distribution as ##Y_k## and the two are independent of each other, but the distributions may differ from that of ##X_j## for ##j\neq k##, and there may be dependences between ##X_k## and ##X_j##. There are a variety of possible interpretations.

I will assume that the set of all ##2K## random variables is pairwise independent, and that all ##X_k## have the same PDF ##f_X## and all the ##Y_k## have the same PDF ##f_Y##, but the two PDFs may differ from one another. I will also assume that the support of all the random variables is the positive reals, which is implied by the integral you have written.

With those assumptions, the distribution of ##X_{(k)}/Y_{i_k}## is the same as that of ##X_{(k)}/Y_{j}## for any ##j##. Then the CDF of ##G_k## satisfies:

$$ F_{G_k}(a) = \int_0^\infty\int_0^{\eta a}f_{X_{(k)}}(x) \,dx\,f_Y(y)\,dy$$

The PDF ##f_{X_{(k)}}## of the ##k##th order statistic of ##X## can be calculated using the formula here.

The PDF ##f_{G_k}## of ##G_k## is obtained by differentiating the above expression.
Having obtained that, we calculate the CDF of ##G_1+G_2+G_3## using the triple integral you wrote above, but replacing each ##f_{Z_k}## by ##f_{G_k}##.
 
  • Like
Likes EngWiPy
  • #3
What I meant by ##\{X_k, Y_k\}## of being i.i.d. is that every single random variable is independent of all other random variables, and all have the same PDF distribution. I could have written that the random variables ##\{X_k\}_{k=1}^{2K}## are i.i.d., but because I need to separate the nominator and denominator, I used different symbols. These random variables are exponential random variables, so yes, the support is positive real. I wasn't clear about that. My mistake.

andrewkirk said:
...
$$ F_{G_k}(a) = \int_0^\infty\int_0^{\eta a}f_{X_{(k)}}(x) \,dx\,f_Y(y)\,dy$$

The PDF ##f_{X_{(k)}}## of the ##k##th order statistic of ##X## can be calculated using the formula here.

The PDF ##f_{G_k}## of ##G_k## is obtained by differentiating the above expression.
Having obtained that, we calculate the CDF of ##G_1+G_2+G_3## using the triple integral you wrote above, but replacing each ##f_{Z_k}## by ##f_{G_k}##.
OK, so we can average over each ##G_k## as if the random variables are independent with no ordering, but what about the limits? I mean, ##G_3=\frac{X_{(3)}}{Y_3}## has the limits ##0## to ##\eta##, but ##G_2## contains ##X_{(2)}## which is less than or equal ##X_{(3)}## in ##G_3##. Or this doesn't affect the evaluation?
 
  • #4
EngWiPy said:
we can average over each ##G_k## as if the random variables are independent with no ordering, but what about the limits? I mean, ##G_3=\frac{X_{(3)}}{Y_3}## has the limits ##0## to ##\eta##, but ##G_2## contains ##X_{(2)}## which is less than or equal ##X_{(3)}## in ##G_3##. Or this doesn't affect the evaluation?
You are right to point out that the new ##G_k##s are not mutually independent, because they are based on order statistics. So we have to change the integrand in your triple integral, because it is based on the Zs being independent.

Instead we'll have to do (take a deep breath!):
\begin{align*}
\text{Pr}&\left[\sum_{i=1}^3G_k\leq \eta\right]
=
\text{Pr}\left[\frac{X_{(1)}}{Y_1}+\frac{X_{(2)}}{Y_2}+\frac{X_{(3)}}{Y_3}\leq \eta\right]
\\
&=
\int_{y_1=0}^\infty \int_{x_1=0}^{y_1\eta}
\int_{y_2=0}^\infty \int_{x_2=0}^{y_2(\eta - x_1/y_1)}
\int_{y_3=0}^\infty \int_{x_3=0}^{y_3(\eta - x_1/y_1-x_2/y_2)}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
f_Y(y_1) f_Y(y_2) f_Y(y_3)
f_{X_{(1)},X_{(2)},X_{(3)}}(x_1,x_2,x_3)
\,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
\\
&=
\int_{y_1=0}^\infty f_Y(y_1) \int_{x_1=0}^{y_1\eta}
\int_{y_2=0}^\infty f_Y(y_2) \int_{x_2=0}^{y_2(\eta - x_1/y_1)}
\int_{y_3=0}^\infty f_Y(y_3) \int_{x_3=0}^{y_3(\eta - x_1/y_1-x_2/y_2)}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
f_{X_{(1)},X_{(2)},X_{(3)}}(x_1,x_2,x_3)
\,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
\end{align*}

It looks horrible, but it should simplify down if all your distributions are exponentials.

You can get the three-way joint probability function for the Xs from the wiki section I linked above. The formula is on the last line in the section, with ##n=3##.

EDIT: Actually the formulas in that section don't give exactly what we want, which is the joint density for the three smallest order stats, but we can infer a formula using the same principles. It should be:

$$\frac{K!}{(K-3)!}f_X(x_1)f_X(x_2)f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}$$

where ##x_1\le x_2\le x_3##, otherwise zero. To enforce that inequality condition we need to change the limits for the X integrals, to get:
\begin{align*}
\text{Pr}&\left[\sum_{i=1}^3G_k\leq \eta\right]
=
\int_{y_1=0}^\infty \int_{x_1=0}^{y_1\eta}
\int_{y_2=0}^\infty \int_{x_2=x_1}^{\max(x_1,y_2(\eta - x_1/y_1))}
\int_{y_3=0}^\infty \int_{x_3=x_2}^{\max(x_2,y_3(\eta - x_1/y_1-x_2/y_2))}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\frac{K!}{(K-3)!}
f_Y(y_1)f_Y(y_2)f_Y(y_3)
f_X(x_1)f_X(x_2)f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}
\,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
\\&=
\frac{K!}{(K-3)!}
\int_{y_1=0}^\infty f_Y(y_1) \int_{x_1=0}^{y_1\eta} f_X(x_1)
\int_{y_2=0}^\infty f_Y(y_2) \int_{x_2=x_1}^{\max(x_1,y_2(\eta - x_1/y_1))} f_X(x_2)
\int_{y_3=0}^\infty f_Y(y_3) \int_{x_3=x_2}^{\max(x_2,y_3(\eta - x_1/y_1-x_2/y_2))}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
f_X(x_3) \left[1-F_X(x_3)\right]^{K-3}
\,dx_3\,dy_3\,dx_2\,dy_2\,dx_1\,dy_1
\end{align*}
 
Last edited:
  • Like
Likes EngWiPy
  • #5
Thanks. As expected. The result is nasty. Could you explain how you arrived to the integrals ( and their limits) in a little more detail?

Another but related question, can this CDF be evaluated numerically using any software tool for any value ##1\leq M\leq K## instead of 3?
 
  • #6
EngWiPy said:
Thanks. As expected. The result is nasty. Could you explain how you arrived to the integrals ( and their limits) in a little more detail?
If all the distributions are exponentials, I think there's a reasonable chance you could evaluate this analytically, starting from the inside and working out. It's possible the max-es in the upper limits may throw a spanner in the works, but it's worth a try. If not then a good mathematical software tool should be able to numerically integrate it. If it complains about the six-deep nesting, it shouldn't be too hard to write a little program to do it, in something like Matlab, Python or R, assuming the exponential parameter and the value of ##\eta## are both known.

If we change 3 to some other M in 1...K, the same principles apply, but instead of six-deep nesting of integrals it will be 2M-deep! Maybe there's some formula that can reduce the number of integrals, but I can't see one from here.

Setting ##x_0=0## to make the pattern of limits clearer, the formula would be:

\begin{align*}
\text{Pr}&\left[\sum_{i=1}^M G_k\leq \eta\right]
=
\int_{y_1=0}^\infty \int_{x_1=x_0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)}
\int_{y_2=0}^\infty \int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)}
\\
&\quad\quad\quad\quad\quad\quad\quad\quad\quad...
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
\int_{y_M=0}^\infty \int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\frac{K!}{(K-M)!}
f_Y(y_1)f_Y(y_2)...f_Y(y_M)
f_X(x_1)f_X(x_2)...f_X(x_M) \left[1-F_X(x_M)\right]^{K-M}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1
%
\\&=
\frac{K!}{(K-M)!}
\int_{y_1=0}^\infty f_Y(y_1)
\int_{x_1=0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)} f_X(x_1)
\int_{y_2=0}^\infty f_Y(y_2)
\int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)} f_X(x_2)
\\
&\quad\quad\quad\quad\quad\quad\quad\quad\quad...
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
\int_{y_M=0}^\infty f_Y(y_M)
\int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)} f_X(x_M)
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\left[1-F_X(x_M)\right]^{K-M}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1
\end{align*}

The complexity of the integral is nearly all in the limits. In the centre is just the joint PDF of all six variables. That is the product of three PDFs for the three independent Y random variable and the joint PDF for the three Xs. The latter is converted to a product of three independent X PDFs, with a combinatoric coefficient because of the different possible orderings, and the factor ##\left[1-F_X(x_M)\right]^{K-M}## that ensures that all the other ##M-K## X values are greater than ##X_{(M)}##.

The integration limits are set up from the outside in. The range of the Y variables is unconstrained. But each X variable must have an upper limit to ensure that it doesn't cause the sum of all the Gs covered by the integral signs thus far to exceed ##\eta##. Also, each X has to have a lower limit of the previous X value, to protect the required ordering of the order statistics, and that requires an adjustment to the upper limit as well, so that if the lower limit is greater than the upper limit, we get zero rather than a negative number from a backwards integral, which is a possible interpretation when the lower integral limit exceeds the upper one.
 
  • Like
Likes EngWiPy
  • #7
That makes sense. The complexity of the mathematical expression is not an issue. Eventually, I need to evaluate the CDF. It doesn't matter how (except not using Monte-Carlo simulation, because I use it for verification). Of course, if a numerical solution is possible, that is great, because it would be easier to evaluate, but numerical integration is just fine. The analytical solution doesn't have a pattern in the case of no ordering, so I suspect it would have with ordering. I will try this again anyway.

Could you give me a hint on how to write a program in either MATLAB or Python to evaluate these ##2M## nested integrals? In MATLAB I think you cannot go for more than 2 or maybe three nested integrals. I need a code that can be used for any value of M.
 
  • #8
For the general case of M rather than 3, the depth of nesting is unknown, so it's easiest to use recursion.
I wouldn't normally do this but, since I'm learning Python, I wrote the following Python code for practice. It is untested but should do the trick if you enter the necessary PDFs for the X and Y distributions in the function skeletons below.

The integration is bog-basic rectangular integration. If you want you can sex it up to trapezoidal or even Simpson integration for a more accurate result. But unless it takes a long time to run, it's probably easier to keep it rectangular and just increase the granularity.

Code:
granularity = 1000
lim_y = 100
M = 10 # order statistic
K = 30 # number of Xs (also the number of Ys)

def pdf_y(y)
  # define pdf function for y here

def pdf_x(x)
  # define pdf function for x here

def cdf_x(x)
  # define cdf function for x here

def permut(n, k)
  # calc number of permutations of k chosen from n
  perm = n
  j = 1
  while j < k  
    perm *= n - j
    j += 1
  return perm
# end of permutation function

def integ2(eta, x_prev, sum_x, m, k)
  y = 0
  integ_y = 0
  # integrate across this y_(M-m)
  while y < lim_y
    y += dy
    lim_x = max(x_prev, y * (eta - sum_x))
    dx = (limx - x_prev) / granularity
    x = x_prev
    integ_x = 0
    # integrate across this x_(M-m)
    while x < lim_x
       x += dx
       if m > 0
         integrand = integ2(eta, x, sum_x + x / y, m - 1, k - 1))
       else
         integrand = (1 - cdf_x(x)) ^ k
       integ_x += integrand * pdf_x(x) * dx
    integ_y += integ_x * pdf_y(y) * dy
  return integ_y
# end of definition of function integ2

dy = lim_y / granularity
# call the function to find the probability
prob = permut(K, M) * integ2(eta, 0, 0, M - 1, K - 1)
print(prob)
 
  • #9
I will need to go over the math and the code in more details, and compare the results with those of Monte-Carlo simulations at the end. I will report back. Thanks for your responses. They were very helpful.
 
  • #10
@andrewkirk Apparently, for ##M\geq 3## the CDF is very complicated to be evaluated numerically, and simulations has to be done. But simulations isn't an efficient way to evaluate nested interdependent integrals. So, I am trying to evaluate a closed-form solution for the special case of ##M=2##. In particular, I want to evaluate the following CDF (note that I took another way of evaluating the integrals, only to make the approach consistent of some other work I am doing. I think both are essentially the same.)

[tex]F_Z(z) = \text{Pr}\left[\frac{X_{(1)}}{Y_1}+\frac{X_{(2)}}{Y_2}\leq z\right] = \int_{y_2}^{\infty}\int_{x_2=0}^{y_2z}\int_{y_2=0}^{\infty}\int_{x_1=0}^{\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)}f_{(1).(2)}(x_1,x_2)f_Y(y_1)f_Y(y_2)\,dx_1\,dx_2\,dy_1\,dy_2[/tex]

where ##f_{(1),(2)}(x_1,x_2)## is the joint PDF of the order statistics ##X_{(1)}## and ##X_{(2)}##, and ##f_Y(y_k)## is the PDF of the random variable ##Y_k##.

The joint PDF ##f_{(1),(2)}(x_1,x_2)## can be written as (all random variables are i.i.d. exponential RVs with parameter 1)

[tex]f_{(1),(2)}(x_1,x_2) = K(K-1)e^{-x_1}e^{-(K-1)x_2}[/tex]

Thus the original CDF can be written as

[tex]F_Z(z) = K(K-1)\int_{y_2}^{\infty}\int_{x_2=0}^{y_2z}\int_{y_2=0}^{\infty}\int_{x_1=0}^{\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)}e^{-x_1}e^{-(K-1)x_2}e^{-y_1}e^{-y_2}\,dx_1\,dx_2\,dy_1\,dy_2[/tex]

The inner most integral can be evaluate as

[tex]\int_{x_1=0}^{\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)}e^{-x_1}\,dx_1 = 1-\exp\left(-\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)\right)[/tex]

So

[tex]F_Z(z) = K(K-1)\left[I_1(z) - I_2(z)\right][/tex]

##I_1(z)## can be evaluated easily, but I got stuck in evaluating ##I_2## which is

[tex]I_2(z)=\int_{y_2}^{\infty}\int_{x_2=0}^{y_2z}\int_{y_1=0}^{\infty}\exp\left(-\min\left(x_2,\,y_1(z-\frac{x_2}{y_2})\right)\right)e^{-(K-1)x_2}e^{-y_1}e^{-y_2}\,dx_2\,dy_1\,dy_2[/tex]

I divided the integrals as follows

[tex]I_2(z) = \int_{y_2}^{\infty}\int_{y_1}^{\infty}\int_{x_2=0}^{\frac{zy_1y_2}{y_1+y_2}}e^{-Kx_2}e^{-y_1}e^{-y_2}\,dx_2\,dy_1\,dy_2+ \int_{y_2}^{\infty}\int_{y_1}^{\infty}\int_{x_2=\frac{zy_1y_2}{y_1+y_2}}^{y_2z}e^{-y_1(z-\frac{x_2}{y_2}+1)}e^{-(K-1)x_2}e^{-y_2}\,dx_2\,dy_1\,dy_2[/tex]

The limit ##\frac{zy_1y_2}{y_1+y_2}## makes the outer integrals difficult to evaluate. Is there another way or any tricks I can do to evaluate the original CDF in closed-form?
 
Last edited:
  • #11
I am not optimistic of being able to get a nice analytic formula (closed-form), because of the complicated, inter-related limits. But I think we can get a reasonable algorithm for numerical integration, using the same technique as we discussed recently for doing nested integrals by iteration from the inside out.

It is based on the formulas I wrote above:

\begin{align*}
\text{Pr}&\left[\sum_{i=1}^M G_k\leq \eta\right]
=
\int_{y_1=0}^\infty \int_{x_1=x_0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)}
\int_{y_2=0}^\infty \int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)}
\\
&\quad\quad\quad\quad\quad\quad\quad\quad\quad...
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
\int_{y_M=0}^\infty \int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\frac{K!}{(K-M)!}
f_Y(y_1)f_Y(y_2)...f_Y(y_M)
f_X(x_1)f_X(x_2)...f_X(x_M) \left[1-F_X(x_M)\right]^{K-M}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1

\\&=
\frac{K!}{(K-M)!}
\int_{y_1=0}^\infty f_Y(y_1)
\int_{x_1=0}^{\max\left(x_0,\,y_1\left(\eta - \sum_{k=1}^0 x_k/y_k\right)\right)} f_X(x_1)
\int_{y_2=0}^\infty f_Y(y_2)
\int_{x_2=x_1}^{\max\left(x_1,\,y_2\left(\eta - \sum_{k=1}^1 x_k/y_k\right)\right)} f_X(x_2)
\\
&\quad\quad\quad\quad\quad\quad\quad\quad\quad...
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad
\int_{y_M=0}^\infty f_Y(y_M)
\int_{x_M=x_{M-1}}^{\max\left(x_{M-1},\,y_M\left(\eta - \sum_{k=1}^{M-1} x_k/y_k\right)\right)} f_X(x_M)
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\left[1-F_X(x_M)\right]^{K-M}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\,dx_M\,dy_M...\,dx_2\,dy_2\,dx_1\,dy_1
\end{align*}

For ##k=1,...,M## define functions
##I_k^M:\mathbb R^2\to \mathbb R## and ##J_k^M:\mathbb R^3\to \mathbb R## such that:

##I_k^M(x_{k-1},s_{k-1}) = \int_0^\infty f_Y(y_k) J_k^M(x_{k-1},s_{k-1}, y_k)\,dy_k##

##J_k^M(x_{k-1},s_{k-1}, y_k) =
\begin{cases}
0&\mbox{, if }x_{k-1} \ge y_ks_{k-1}\textrm{, else}\\
\int_{x_{k-1}}^{y_ks_{k-1}} f_X(x_k)\left(1-F_X(x_k)\right)^{K-M}\,dx_k
&\mbox{, if } k=M \textrm{, else}\\
\int_{x_{k-1}}^{y_ks_{k-1}} f_X(x_k)
I_{k+1}^M(x_k, s_{k-1} - x_k/y_k)
\,dx_k
&\mbox{, otherwise}
\end{cases} ##

Note that ##s_k=s_0 - \sum_{j=1}^k x_k/y_k##.

Then by suitably constraining and segmenting the range of possible values for ##x, y, s## so that each range contains say ##N## points for ##x,y## and ##L>N## points for ##s##, we can calculate and store these functions as ##N\times L##-shaped tables for the ##I_k^M## and ##N\times L\times N##-shaped tables for the ##J_k^M##. We start with ##J_M^M## and then work outwards, at each stage using the table just determined to do the integration for the new function. For accuracy we should use Simpson's Rule, with an extra trapezoid at the end when the number of segments in the integration region is even.

This approach will take longer to run than the one considered earlier, having the order of ##N^3L## iterations. But for any given value of ##K## and ##M##, it only needs to be run once, and only the table for ##I_1^M## needs to be retained. Then for any ##\eta\in\mathbb R^+## we can calculate:

$$\text{Pr}\left[\sum_{i=1}^M G_i\leq \eta\right]
= \frac{K!}{(K-M)!}I_1^M(0, \eta)$$

This can be programmed in a language like Python, following the same principles as the earlier example. It will be somewhat more complex because of the two extra dimensions, but eminently doable. I suggest starting with a very coarse grid until it is fully debugged, then running it overnight on a finer grid.
 
Last edited:
  • Like
Likes EngWiPy
  • #12
Thanks, but I still don't get the implementation of the example you did in the other thread in R and Python without selection/ordering. I tried to start there because it is an easier case, and then come back here. But since I didn't understand the implementation there, I thought may be I can do some analytical solution for the special case ##M=2##. I found an analytic solution when the ordering includes the whole fraction ##X_k/Y_k##. In this case, I just have to consider the whole fraction as a new RV, and thus will have double integral. But when ordering is done on one of the RVs ##X_k## or ##Y_k##, it becomes much more complicated, and you need to consider the individual RVs.

Without ordering there was ##M-1## nested integrals (which would correspond to selecting ##M## out of ##K## random variables at random, and ##-1## because for the last integral we used the CDF), now there would be ##2M## nested integrals (here we need to consider the joint PDF, and we cannot use the CDFs). I get the math and how to get the limits, but I don't understand the implementation. I do for the recursion, though but it is not efficient.

So, you think even for ##M=2## there is no analytic solution? In that case, doing a numerical integration for ##M=2## is not as a contribution as for a general ##M##. Probably I should read your other posts again.

Thanks for your responses.

PS: Just a side note, wouldn't it be easier if you consider the larger RVs first? In that case the lower limits for the integrals would be 0, and the upper limits would be min instead of max. It probably would make a difference for an analytic solution, but maybe not for a numerical solution. Just a remark.
 
Last edited:
  • #13
Approaching it analytically with ##M=2## I can get one step further than was reached above, to where there are two integrals left to evaluate. But at that point it looks about to become horribly messy, because the next integration variable ##x_1## appears in both the numerator and denominator of exponents, as well as in the denominator of the integrand. Wolfram doesn't produce anything that looks easy to work with. FWIW, here's my development:

For ##M=2## and ##X,Y## both exponential with parameter 1, we have:
%
\begin{align*}
\text{Pr}&\left[\sum_{i=1}^M G_k\leq \eta\right]
=
\frac{K!}{(K-2)!}
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\int_{y_2=0}^\infty e^{-y_2}
\int_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)} e^{-x_2}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
\left[1-\left(1-e^{-x_2}\right)\right]^{K-2}
\,dx_2\,dy_2\,dx_1\,dy_1 \\
% 2
&=
K(K-1)
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\int_{y_2=0}^\infty e^{-y_2}
\int_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)} e^{-x_2}
e^{-x_2(K-2)}
\,dx_2\,dy_2\,dx_1\,dy_1 \\
% 3
&=
K(K-1)
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\int_{y_2=0}^\infty e^{-y_2}
\int_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)}
e^{-x_2(K-1)}
\,dx_2\,dy_2\,dx_1\,dy_1
\\% 4
&=
\frac{K(K-1)}{K-1}
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\int_{y_2=0}^\infty e^{-y_2}
\left[
-e^{-x_2(K-1)}
\right]_{x_2=x_{1}}^{\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)}
\,dy_2\,dx_1\,dy_1
%
\\% 5
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\int_{y_2=0}^\infty e^{-y_2}
\left(
e^{-x_1(K-1)}
-
e^{-\left(\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)\right)(K-1)}
\right)
\,dy_2\,dx_1\,dy_1
%
\\% 6
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\left(
e^{-x_1(K-1)}
-
\int_{y_2=0}^\infty e^{-y_2}
e^{-\left(\max\left(x_{1},\,y_2\left(\eta - x_1/y_1\right)\right)\right)(K-1)}
\,dy_2 \right)
\,dx_1\,dy_1
%
\\% 7
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\bigg(
e^{-x_1(K-1)}
- \int_{y_2=0}^{\frac{x_1}{\eta - x_1/y_1}} e^{-y_2}
e^{-x_1(K-1)} \,dy_2
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
- \int_{y_2={\frac{x_1}{\eta - x_1/y_1}}}^\infty e^{-y_2}
e^{-y_2\left(\eta - x_1/y_1\right)(K-1)} \,dy_2
\bigg)
\,dx_1\,dy_1
%
\\% 8
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\bigg(
e^{-x_1(K-1)}
+
e^{-x_1(K-1)}\bigg[e^{-y_2}\bigg]_{y_2=0}^{\frac{x_1}{\eta - x_1/y_1}}
\\&\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad
- \int_{y_2={\frac{x_1}{\eta - x_1/y_1}}}^\infty
e^{-y_2\left( 1 + \left(\eta - x_1/y_1\right)(K-1)\right)} \,dy_2
\bigg)
\,dx_1\,dy_1
%
\\% 9
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\bigg(
e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}
+ \frac{
\left[
e^{-y_2\left( 1 + \left(\eta - x_1/y_1\right)(K-1)\right)}
\right]_{y_2={\frac{x_1}{\eta - x_1/y_1}}}^\infty
}{1 + \left(\eta - x_1/y_1\right)(K-1)}
\bigg)
\,dx_1\,dy_1
%
\\% 10
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta} e^{-x_1}
\bigg(
e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}
- \frac{
e^{-{{{\frac{x_1}{\eta - x_1/y_1}}} }\left( 1 + \left(\eta - x_1/y_1\right)(K-1)\right)}
}{1 + \left(\eta - x_1/y_1\right)(K-1)}
\bigg)
\,dx_1\,dy_1
%
\\% 11
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta}
\bigg(
e^{-x_1-\frac{x_1}{\eta - x_1/y_1}(K-1)}
+
\frac{
e^{-\frac{x_1}{\eta - x_1/y_1} - x_1(K-1)}
}{1 + \left(\eta - x_1/y_1\right)(K-1)}
\bigg)
\,dx_1\,dy_1
%
\\% 12
&=
K
\int_{y_1=0}^\infty e^{-y_1}
\int_{x_1=0}^{y_1\eta}
e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}
\bigg(
e^{-x_1}
+
\frac{
e^{-x_1(K-1)}
}{1 + \left(\eta - x_1/y_1\right)(K-1)}
\bigg)
\,dx_1\,dy_1
\end{align*}

That's the point where I usually give up and take the numeric approach.
 
  • Like
Likes EngWiPy
  • #14
I reached to almost a similar result. You divided the integral over ##y_2##, while I divided the integral over ##x_2##. It seems that numerical integration is the way to go.
 
  • #15
I tried to do the numerical integrals for your results. If we write your result as ##K(I_1(\eta)+I_2(\eta)))##, then the first integral ##I_1(\eta)## was evaluated OK, but the second ##I_2(\eta)## gives me a division by zero error. It gives me the same error when I divide the integral over ##x_2## instead of ##y_2##!
 
  • #16
EngWiPy said:
I tried to do the numerical integrals for your results. If we write your result as ##K(I_1(\eta)+I_2(\eta)))##, then the first integral ##I_1(\eta)## was evaluated OK, but the second ##I_2(\eta)## gives me a division by zero error. It gives me the same error when I divide the integral over ##x_2## instead of ##y_2##!
I expect that's because the denominator ##(\eta - x_1/y_1)## of the exponent of the factor ##e^{-\frac{x_1}{\eta - x_1/y_1}(K-1)}## in the inner integrand goes to zero at the upper inner integration limit. The integral is an improper integral, but the integrand can be made continuous. You should be able to get around the problem by changing the integrand to be a branched function that gives zero if ##x_1 = y_1\eta## and what is written otherwise.
 
  • Like
Likes EngWiPy
  • #17
You are right. When I accounted for the case ##\eta - x_1/y_1 = 0## I got the result without any warning or errors. I thought the software takes care of this, since ##e^{-\infty} = 0##! Apparently not. Thanks
 
  • #18
andrewkirk said:
For the general case of M rather than 3, the depth of nesting is unknown, so it's easiest to use recursion.
I wouldn't normally do this but, since I'm learning Python, I wrote the following Python code for practice. It is untested but should do the trick if you enter the necessary PDFs for the X and Y distributions in the function skeletons below.

The integration is bog-basic rectangular integration. If you want you can sex it up to trapezoidal or even Simpson integration for a more accurate result. But unless it takes a long time to run, it's probably easier to keep it rectangular and just increase the granularity.

Code:
granularity = 1000
lim_y = 100
M = 10 # order statistic
K = 30 # number of Xs (also the number of Ys)

def pdf_y(y)
  # define pdf function for y here

def pdf_x(x)
  # define pdf function for x here

def cdf_x(x)
  # define cdf function for x here

def permut(n, k)
  # calc number of permutations of k chosen from n
  perm = n
  j = 1
  while j < k
    perm *= n - j
    j += 1
  return perm
# end of permutation function

def integ2(eta, x_prev, sum_x, m, k)
  y = 0
  integ_y = 0
  # integrate across this y_(M-m)
  while y < lim_y
    y += dy
    lim_x = max(x_prev, y * (eta - sum_x))
    dx = (limx - x_prev) / granularity
    x = x_prev
    integ_x = 0
    # integrate across this x_(M-m)
    while x < lim_x
       x += dx
       if m > 0
         integrand = integ2(eta, x, sum_x + x / y, m - 1, k - 1))
       else
         integrand = (1 - cdf_x(x)) ^ k
       integ_x += integrand * pdf_x(x) * dx
    integ_y += integ_x * pdf_y(y) * dy
  return integ_y
# end of definition of function integ2

dy = lim_y / granularity
# call the function to find the probability
prob = permut(K, M) * integ2(eta, 0, 0, M - 1, K - 1)
print(prob)

Have you run this code? Why do you pass M-1 and not M to the original function? Also why do you pass K-1 (it is not needed)? I am trying to implement the mathematical integration numerically in Python (I did it in a slightly different way), but it is not working. I am trying to figure out why. I wrote the recursive function mathematically as the following:

[tex]F_{m-1}(z_{m-2}) = \int_{y_{m-1}=0}^{\infty}\int_{x_{m-1}=x_{m-2}}^{\max(x_{m-2},\,y_{m-1}\,z_{m-2})}F_m\left(z_{m-2}-\frac{x_{m-1}}{y_{m-1}}\right)f_X(x_{m-1})f_Y(y_{m-1})\,dx_{m-1}\,dy_{m-1}[/tex]

where

[tex]F_{M}(z_{M-1}) = \int_{y_{M}=0}^{\infty}\int_{x_{M}=x_{M-1}}^{\max(x_{M-1},\,y_{M}\,z_{M-1})}\left[1-F_X(x_M)\right]^{K-M}f_X(x_{M})f_Y(y_{M})\,dx_{M}\,dy_{M}[/tex]

So, we need to find ##\frac{K!}{(K-M)!}F_1(z_0)##.

The code corresponding to this recursive function is

Python:
import math
K=3
M=2
dx = 0.01
dy = 0.01
y_lim = 100

#the CDF of X
def F(x):
    return math.exp(-x*(K-M))

#The pdf of Xs and Ys
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim):
    if m == M:
        res = 0
        ux = x_prev_lim
        uy = 0
        while uy < y_lim:
            uy += dy
            resx = 0
            while ux <= max(x_prev_lim, uy*z):
                ux += dx
                resx += F(ux)*f(ux)*dx
            res += resx*f(uy)*dy
 
    else:
        res = 0
        ux = x_prev_lim
        uy = 0
        while uy < y_lim:
            uy += dy
            resx = 0
            while ux <= max(x_prev_lim, uy*z):
                ux += dx
                resx += G(z - (ux/uy), m+1, ux)*f(ux)*dx
            res += resx*f(uy)*dy
    return res

print((math.factorial(K)/math.factorial(K-M))*G(1, 1, 0))

Where did I make a mistake?
 
Last edited:
  • #19
EngWiPy said:
Why do you pass M-1 and not M to the original function?
Because there are M nested integrals if the innermost integrand is a PDF, but in the above code the innermost integrand is a CDF, which is the integral of a PDF, so we use only M-1 nested integrals.

Here is that innermost integrand:
Code:
integrand = (1 - cdf_x(x)) ^ k
Also why do you pass K-1 (it is not needed)?
The K is used as the exponent in that innermost integrand.

I am away from home till Monday, and hence away from my Python compiler. But I hope the above is of some help.
 
  • Like
Likes EngWiPy
  • #20
OK, that makes sense when the inner most double-integrals (##F_M(z_{M-1})## in the above equation) are evaluated numerically. Otherwise, the CDF of ##X## (which appears what you are referring to) is part of the joint PDF of the order statistics. And thus we still have ##M## nested double-integrals, right?

Regarding the argument ##k-1##, in the mathematical formulation, ##K## appears only once in evaluating ##F_M(z_{M-1})##, and it is constant. I wonder why you included it as an argument.
 
Last edited:
  • #21
EngWiPy said:
Regarding the argument ##k-1##, in the mathematical formulation, ##K## appears only once in evaluating ##F_M(z_{M-1})##, and it is constant. I wonder why you included it as an argument.
When it is used, its value is K-M, because k starts at M-1 and is decremented by 1 in each recursive call, of which there are K-1.
Two alternative ways we could have done this are:
  • pass K-M as a parameter, all the way down through the recursion and use it as the exponent in the base case, or
  • not pass it as a parameter and instead use the global variables K and M in the base case. I try to avoid doing that, as it is considered bad programming practice (not 'modular') to use variables inside a function that were not passed as parameters. But I see I have already broken that principle by using the globals dy and lim_y
 
  • #22
OK, I see what you were doing. I defined M and K as global variables in my code, that is why I didn't need them as arguments. Otherwise, the code is clear, and I used it with some modifications based on my formulation. The logic seems fine, but the final result is way smaller than that of the corresponding Monte-Carlo simulations! Is approximating the upper limit ##\infty## for the Y RVs with 100 is good enough? I went to 1000, but the running time was way too long.
 
  • #23
The way to make this tractable is to define a function ##H_k## that is the composition of the functions ##I^M_k## and ##J^M_k## that I defined in post 11, then the integral is the composition
$$\frac{K!}{(K-M)!} H_1 \circ H_2 \circ H_3 \circ ... \circ H_M (0,\eta)$$

The function ##H_k## is given by:

$$H_k(x_{k-1},s_{k-1}) = \int_0^\infty f_Y(y_k) \int_{x_k=x_{k-1}}^{\max(x_{k-1},y_k s_{k-1})} f_X(x_k) H_{k+1} (x_k, s_{k-1}-x_k/y_k)\,dx_k\,dy_k$$
if ##k<M## and
$$H_k(x_{k-1},s_{k-1}) = \int_0^\infty f_Y(y_k) \int_{x_k=x_{k-1}}^{\max(x_{k-1},y_k s_{k-1})} f_X(x_k)
(1-F_X(x_M))^{K-M}
\,dx_k\,dy_k$$
otherwise.

Start by numerically evaluating values of ##H_M(x_{M-1},s_{M-1})## for a reasonably finely grained 2D table of ##x_{M-1},s_{M-1}##.

Then for each ##k## from ##M-1## down to 1, use the 2D table from the previous step to calculate a new 2D table of values for
##H_k(x_{k-1},s_{k-1})##.

For the last case of ##k=1## you need only calculate a 1D table, setting ##x_0=0##. The first parameter ##s_0## is the outermost integration limit ##\eta##.

Then multiply that last (1D) table of ##H_1(0,\eta)## by ##K!/(K-M)!## to get the vector of definite integrals that you seek.

You should be able to choose suitable granularity and integration limits (for ##y_k##) such that the calculation runs fast and accurately.

I recommend you use Simpson integration, as discussed previously. Where there is an even number of intervals to be integrated, use Simpson with an extra trapezium on the end.
 
  • Like
Likes EngWiPy
  • #24
My formulation in post #18 is very similar to yours, with some change of variable names. I also go from the inner most integral to the outer most from m=1 up to M. I think this shouldn't change the final result.

Indeed the code runs very slow for the rectangular rule, even for M=2, especially when the step size is 0.001. I need to make it work first, and then I will optimize it. Creating the tables and the Simpson's rule should improve the running time, but I feel they will complicate the code, while it is not working.

How would you write a Python code for your formulation, without creating tables and using the rectangular rule? Just to keep things simple, and clear for now. Is your code I quoted in post #18 still valid?
 
  • #25
EngWiPy said:
Is approximating the upper limit ∞\infty for the Y RVs with 100 is good enough? I went to 1000, but the running time was way too long.
The upper integration limit has very little effect on the accuracy, because the density function gets so small beyond a limit of about 7. So big integration limits just ramp up runtime with negligible improvement in accuracy.
What has a huge effect on accuracy is the step size dy near 0, because y is a denominator and small changes in that make big changes in the ratio x/y.

I suggest trying it with:
  • upper limit of ##L=7## for both x and y.
  • dx constant at ##\delta=0.01##
  • dy scaled harmonically so that ##1/y - 1/(y+dy)<\delta##. For each y integration, you will calculate a vector of values that are the endpoints of the integration intervals. The intervals will get progressively wider as y increases. That way the finest granularity is applied only where it's needed, near 0.
  • For all but the outer integral over y, set the lower integration limit to be ##x_{k-1}/z_{k-1}##, since the x-integral inside the y-integral will be zero where y is below that.
  • For the outer integral over y, set the lower integration limit to ##\varepsilon=0.0001##
You could try varying the values of ##\delta, \varepsilon## and ##L## to get an optimal balance between runtime and accuracy.
EngWiPy said:
Is your code I quoted in post #18 still valid?
Yes it's valid, but it will run either slowly or inaccurately, because of the issues identified above.
 
  • Like
Likes EngWiPy
  • #26
andrewkirk said:
The upper integration limit has very little effect on the accuracy, because the density function gets so small beyond a limit of about 7. So big integration limits just ramp up runtime with negligible improvement in accuracy.
What has a huge effect on accuracy is the step size dy near 0, because y is a denominator and small changes in that make big changes in the ratio x/y.

I suggest trying it with:
  • upper limit of ##L=7## for both x and y.
  • dx constant at ##\delta=0.01##
  • dy scaled harmonically so that ##1/y - 1/(y+dy)<\delta##. For each y integration, you will calculate a vector of values that are the endpoints of the integration intervals. The intervals will get progressively wider as y increases. That way the finest granularity is applied only where it's needed, near 0.
  • For all but the outer integral over y, set the lower integration limit to be ##x_{k-1}/z_{k-1}##, since the x-integral inside the y-integral will be zero where y is below that.
  • For the outer integral over y, set the lower integration limit to ##\varepsilon=0.0001##
You could try varying the values of ##\delta, \varepsilon## and ##L## to get an optimal balance between runtime and accuracy.

Yes it's valid, but it will run either slowly or inaccurately, because of the issues identified above.

Indeed, the upper limit of ##y## has very little effect confirmed by the different increasing values I tried.

I didn't understand point 3 quite well. How can I implement this? Say I start with ##y_0 = 0.0001##, and end with ##y_N = 7##, what is the vector ##y## in your formula? Do you mean like starting with ##y_0## in ##1/y_0-1/(y_0+dy_0)<\delta## to find ##dy_0##, and then we form ##y_1=y_0+dy_0## and solve ##1/y_1-1/(y_1+dy_1)<\delta## for ##dy_1##, and so on?
 
  • #27
EngWiPy said:
Do you mean like starting with ##y_0## in ##1/y_0-1/(y_0+dy_0)<\delta## to find ##dy_0##, and then we form ##y_1=y_0+dy_0## and solve ##1/y_1-1/(y_1+dy_1)<\delta## for ##dy_1##, and so on?
That's essentially correct. But let's write ##Y_0## rather than ##y_0## as we have an existing, different meaning for lower-case y with subscripts - the subscript denotes which integration variable it is.

For all y-integrals except the outermost one, start with ##Y_0=x_{k-1}/z_{k-1}##, and with the outermost one start with ##Y_0=\varepsilon=0.0001##, because if we start with 0 we'll get a divide-by-zero in the next step. Then we use the inequality you wrote, but turning it into an equality by replacing ##<## by ##=##, and solve it to calculate ##dY_0##, and hence ##Y_1=Y_0+dY_1##, then repeat for ##Y_2, Y_3## etc until we get a value, say ##Y_R## that exceeds ##L=7##. Then the vector of interval endpoints for the current integration is ##Y_0, Y_1, Y_2, ...,Y_R##. Note that this vector changes with each integration because it depends on the value of ##Y_0##.
 
  • Like
Likes EngWiPy
  • #28
You are right about the notations, we already have ##y_1,\ldots,y_M##.

OK, got the step-size for the integrations over ##y##. Now for the lower limit ##Y_{0,k}=x_{k-1}/z_{k-1}##, this is for level ##k## (where level is defined by the following formula), right?

This translates to the following formulation

[tex]
F_{m-1}(z_{m-2}) = \int_{y_{m-1}=\frac{x_{m-2}}{z_{m-2}}}^{\infty}\int_{x_{m-1}=x_{m-2}}^{y_{m-1}\,z_{m-2}}F_m\left(z_{m-2}-\frac{x_{m-1}}{y_{m-1}}\right)f_X(x_{m-1})f_Y(y_{m-1})\,dx_{m-1}\,dy_{m-1}
[/tex]

for ##3\leq m\leq M##, and the ##\max## function is no longer needed. Is this what you mean?
 
  • #29
Yes, that's what I had in mind.
 
  • #30
OK, I will try these changes, and report back. Thanks
 
  • #31
Solving ##dy## yields to the following formula

[tex]dy = y\left[\frac{1}{1-y\,\delta}-1\right][/tex]

To have a positive increment ##dy## we need ##0<y\,\delta<1##. How to incorporate this in the calculation of ##dy##? In other words, what ##dy## should be when the condition is not met? Can we say

[tex]dy=\max\left[\rho,\,\frac{1}{1-y\,\delta}-1\right][/tex]

where ##\rho = 0.001##, for example? Also, I think we need to put some condition on ##dy## like ##0<dy<1##, right?

The code becomes (before applying the conditions on ##dy##)

Python:
import math
K = 3
M = 2
dx = 0.01
delta = 0.01

y_Upper_lim = 7

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim, y_current_lower_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = y_current_lower_lim
    dy = uy*((1/(1-uy*delta)) - 1)
    while uy < y_Upper_lim:
        while ux <= uy*z:
            if m == M:
                res += F(ux)*f(ux)*f(uy)*dx*dy
                ux += dx
            else:
                res += G(z - (ux/uy), m+1, ux, z/ux)*f(ux)*f(uy)*dx*dy
                ux += dx
        uy += dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0.0001, 0.0001)

but ##dy## sometimes becomes negative, and others very large positive (e.g., 1000)!
 
Last edited:
  • #32
The calculation of dy needs to be inside the 'while uy...' loop, otherwise the increment size will be the same throughout that integration.

In the recursive call of G, the last argument needs to be ux/z rather than z/ux.

EngWiPy said:
To have a positive increment dy we need 0<yδ<1. How to incorporate this in the calculation of dy?
That condition will automatically be met in the range being considered. With ##y\le L=7## and ##\delta=0.01##, we'll have ##y\delta \le 0.07##.

EngWiPy said:
Also, I think we need to put some condition on dy like 0<dy<1
Again, with ##L=7## and ##\delta=0.01## that will be unnecessary. I think the largest value of dy we get will be 49/93 which is <1.

Lastly, to speed things up, you may wish to put a minimum of ##\varepsilon=0.0001## on each dy that is calculated, otherwise the first dy in the outer loop will be around ##10^{-6}##, which may make it quite slow. Since that minimum has already been applied impoicitly in the first subinterval, accuracy will not be materially degraded by applying for intervals with larger value of y.
 
  • #33
Ooops, you are right. I updated the code as the following , yet it is not working correctly

Python:
import math
K = 3
M = 2
dx = 0.001
dy = 0.001
delta = 0.001

y_Upper_lim = 7

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim, y_current_lower_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = y_current_lower_lim
    while uy < y_Upper_lim:
        while ux < uy*z:
            print(f'ux = {ux}, uy*z = {uy*z}, uy = {uy}, and dy = {dy}')
            if m == M:
                res += F(ux)*f(ux)*f(uy)*dx*dy
                ux += dx
            else:
                res += G(z - (ux/uy), m+1, ux, ux/z)*f(ux)*f(uy)*dx*dy
                ux += dx
        uy += dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0, 0.0001)

I put a print statement after the while loop of ux, and it seems that ux is always close uy*z. Is this what supposed to happen? Also, from that print statement, controlling dy is tricky, because when it is very small (1E-6), we need set it to say 0.0001, but it grows larger than 0.0001 at some point, in which case we probably want to make it smaller. So, I fixed it for now. The numerical results for the above parameters is 3.3406241938703932e-09, while the MC simulations for the same parameters is 0.0489, which makes much more sense!
 
  • #34
Rather than moving the calculation of dy to inside the 'while uy...' loop, it has been moved to the top where it sets dy to be constant at 0.01. It needs to be inside the aforesaid loop, and to use the formula that relates it to delta and uy.

Given epsilon = 0.0001, the statement is:

dy = max(epsilon, uy*((1/(1-uy*delta)) - 1))

EngWiPy said:
it seems that ux is always close uy*z
For uy close to 0 that should usually be the case. But I would not expect it for uy>1.
EngWiPy said:
but it grows larger than 0.0001 at some point, in which case we probably want to make it smaller
No, that would just slow the program down with no material improvement in accuracy. See second-last para of post #32.
 
  • Like
Likes EngWiPy
  • #35
I did this exactly dy = max(epsilon, uy*((1/(1-uy*delta)) - 1)), but I noticed that dy grows larger, where I thought wouldn't give accurate results. But still, this doesn't give the expected result. There is some logical error. Now I have this code

Python:
import math
K = 3
M = 2
dx = 0.001
#dy = 0.001
delta = 0.001

y_Upper_lim = 7

#[1-FX(x)]^{(K-M)}
def F(x):
    return math.exp(-x*(K-M))

#fX(x) or fY(y)
def f(u):
    return math.exp(-u)

def G(z, m, x_prev_lim, y_current_lower_lim):
    res = 0
    #The lower limit of the integrations over x
    ux = x_prev_lim
    uy = y_current_lower_lim
    while uy < y_Upper_lim:
        dy = max(0.0001, uy*((1/(1-uy*delta)) - 1))
        while ux < uy*z:
            print(f'ux = {ux}\n z = {z}\n uy*z = {uy*z}\n uy = {uy}\n ux/uy = {ux/uy}\n dy = {dy}\n')
            if m == M:
                res += F(ux)*f(ux)*f(uy)*dx*dy
                ux += dx
            else:
                res += G(z - (ux/uy), m+1, ux, ux/z)*f(ux)*f(uy)*dx*dy
                ux += dx
        uy += dy
    return res

(math.factorial(K)/math.factorial(K-M))*G(1, 1, 0, 0.0001)

and the first and last 4 print statements in the while loop of ux are as following

ux = 0
z = 1
uy*z = 0.0001
uy = 0.0001
ux/uy = 0.0
dy = 0.0001

ux = 0
z = 1.0
uy*z = 0.0001
uy = 0.0001
ux/uy = 0.0
dy = 0.0001

ux = 0.001
z = 1.0
uy*z = 0.0010000000000000002
uy = 0.0010000000000000002
ux/uy = 0.9999999999999998
dy = 0.0001

ux = 0.002
z = 1.0
uy*z = 0.0020000000000000005
uy = 0.0020000000000000005
ux/uy = 0.9999999999999998
dy = 0.0001

.
.
.ux = 6.96200000000066
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9994323099317346
dy = 0.04886491290055385

ux = 6.96300000000066
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9995758652764534
dy = 0.04886491290055385

ux = 6.96400000000066
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9997194206211721
dy = 0.04886491290055385

ux = 6.965000000000661
z = 1
uy*z = 6.965954503188108
uy = 6.965954503188108
ux/uy = 0.9998629759658909
dy = 0.04886491290055385

which show that ux, uy, and uy*z are almost equal form ##ux\neq 0## (and thus ux/uy ~ 1), and ##z## remains 1 (as it was passed) through out the execution of the code, and for all the recursion calls! This shouldn't happen. z should change with each recursion call.
 
<h2>1. What is the definition of order statistics?</h2><p>Order statistics refers to the collection of random variables that represent the ordered values of a sample from a population. In other words, it is the process of arranging a set of data in ascending or descending order.</p><h2>2. How do you calculate the CDF for i.i.d. random variables?</h2><p>The CDF (cumulative distribution function) for i.i.d. (independent and identically distributed) random variables can be calculated by first arranging the sample data in ascending order. Then, for each value in the sample, the CDF is equal to the proportion of values that are less than or equal to that value. This can be expressed as a formula: CDF(x) = (number of values ≤ x) / (total number of values).</p><h2>3. What is the significance of order statistics in statistics?</h2><p>Order statistics play a crucial role in statistics as they provide valuable information about the distribution of a population. They can be used to estimate parameters such as the mean, median, and standard deviation, and they also help in identifying outliers and extreme values in a dataset.</p><h2>4. Can order statistics be used for non-i.i.d. random variables?</h2><p>Yes, order statistics can also be used for non-i.i.d. random variables. However, the calculation of the CDF may be more complex as the variables are not independent and may have different distributions. In such cases, specialized techniques may be required to calculate the CDF.</p><h2>5. How can order statistics be applied in real-life scenarios?</h2><p>Order statistics have various applications in fields such as economics, finance, and engineering. In economics, they can be used to study income inequality, while in finance, they can help in analyzing stock market trends. In engineering, order statistics can be used to determine the strength of materials and predict failures in structures.</p>

Related to Order Statistics: CDF Calculation for i.i.d. Random Variables

1. What is the definition of order statistics?

Order statistics refers to the collection of random variables that represent the ordered values of a sample from a population. In other words, it is the process of arranging a set of data in ascending or descending order.

2. How do you calculate the CDF for i.i.d. random variables?

The CDF (cumulative distribution function) for i.i.d. (independent and identically distributed) random variables can be calculated by first arranging the sample data in ascending order. Then, for each value in the sample, the CDF is equal to the proportion of values that are less than or equal to that value. This can be expressed as a formula: CDF(x) = (number of values ≤ x) / (total number of values).

3. What is the significance of order statistics in statistics?

Order statistics play a crucial role in statistics as they provide valuable information about the distribution of a population. They can be used to estimate parameters such as the mean, median, and standard deviation, and they also help in identifying outliers and extreme values in a dataset.

4. Can order statistics be used for non-i.i.d. random variables?

Yes, order statistics can also be used for non-i.i.d. random variables. However, the calculation of the CDF may be more complex as the variables are not independent and may have different distributions. In such cases, specialized techniques may be required to calculate the CDF.

5. How can order statistics be applied in real-life scenarios?

Order statistics have various applications in fields such as economics, finance, and engineering. In economics, they can be used to study income inequality, while in finance, they can help in analyzing stock market trends. In engineering, order statistics can be used to determine the strength of materials and predict failures in structures.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
5K
  • Set Theory, Logic, Probability, Statistics
2
Replies
36
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top