Numerical Integration in Python

EngWiPy · Sep 3, 2018

OK, that explains it. I will switch to C++. This is taking too much time in Python, and this is only to set the parameters right, and for one point!

The first formula in post #29 is missing ##f_Y(y_2)##, and thus it should be

[tex]
K(K-2)\int_{y_1=0}^{\infty}\int_{x_1=x_0}^{\max(x_0,\,y_1z_0)}\left[\int_{y_2=0}^{\infty}\int_{x_2=x_1}^{\max(x_1,\,y_2z_1)}\left[1-F_X(x_2)\right]^{K-2}f_X(x_2)f_Y(y_2)\,dx_2\,dy_2\right]f_X(x_1)\,f_Y(y_1)\,dx_1\,dy_1
[/tex]

but the code is correct, because it accounts for it. It was a mistake of typing the formula from my notes.

EngWiPy · Sep 3, 2018

Could you post your C code here? Probably it is because of the machine and how efficient the program was written (I am no expert in programming in general, and C++ in particular), but my C++ code takes (way) longer than 20s, and I assume there is no speed difference between C and C++, or that what I read at least. This is my code

Code:

#include <iostream>
#include <cmath>

using namespace std;

double F(double x, int K, int M){
    return exp(-x * (K - M));
}

double f(double x){
    return exp(-x);
}

double G(double z, int m, int M, int K, double x_prev_lim, double y_current_lim){
    double res (0);
    double y_upper_lim (7);
    double uy = y_current_lim;
    double dx (0.001);
    double delta (0.001);
    while (uy < y_upper_lim){
        double ux = x_prev_lim;
        double dy = max(0.001, uy * (1/(1 - uy * delta) - 1));
        while (ux < uy * z){
            if (m == M){
                res += F(ux, K, M) * f(ux) * f(uy) * dx * dy;
            }else{
                res += G(z - (ux/uy), m+1, M, K, ux, (ux / z)) * f(ux) * f(uy) * dx * dy;
            }
            ux += dx;
        }
        uy += dy;
    }
    return res;
}

int fact (int A){
     if (A == 0) {
         return 1;
     }
     return A*fact(A-1);
}int main(){
 
    int K (3);
    int M (2);
 
    cout << (fact(K) / fact(K-M)) * G(1, 1, M, K, 0, 0.001)<<endl;

    return 0;
}

Note that when ##y_{m-1}<\frac{x_{m-2}}{z_{m-2}}##, the inner integration in the formula

[tex]
G_{m-1}(z_{m-2})=\int_{y_{m-1}=0}^{\infty}\left[\int_{x_{m-1}=x_{m-2}}^{\max(x_{m-2},\,y_{m-1}\,z_{m-2})}G_{m}(z_{m-2}-\frac{x_{m-1}}{y_{m-1}})\,f_X(x_{m-1})dx_{m-1}\right]f_Y(y_{m-1})\,dy_{m-1}
[/tex]

is 0, and ##\max(x_{m-2},\,y_{m-1}\,z_{m-2}) = y_{m-1} z_{m-2}## for ##y_{m-1}\geq\frac{x_{m-2}}{z_{m-2}}##. I tested it for dx = delta = 0.01 (delta is related to dy somehow), and it gives a result between 0 and 1, it is 0.33, which is not 0.048 as in the MC simulation. I am not sure if it will give better results with smaller dx and delta with 0.001. I am waiting to see, but as I said, it still takes a long time to run, even in C++.

SlowThinker · Sep 3, 2018

EngWiPy said:

Could you post your C code here?

Unfortunately I don't have it here, I did it at work. But I rewrote your Python program pretty much 1:1.
I guess it doesn't help that you carry around all the parameters. It would be better to declare K, M, dx, dy as constants as you did in Python (using #define).
I remember I added a test at the beginning of G() to return 0 if z<0 but it didn't seem to change much.

I'm still trying to understand how the sort function is "accounted for".
Do you have some intermediate steps between the Monte Carlo program and the integral formula? I can't really see how they can be the same thing.

EngWiPy · Sep 3, 2018

In the mathematical formulation, the sort function is accounted for by saying that ##X_{(1)}## and ##X_{(2)}## are the minimum two values. The derivation of the integration is straightforward from the probability in post #31. The simulation is a different approach, and thus there is no intermediary steps between the two. But basically, I have two set of random variables ##\{X_k\}_{k=1}^{K}##, and ##\{Y_k\}_{k=1}^{K}##. Then I order the first set of random variable in ascending order, and pick the first two after ordering. Then I formed the random variable ##\frac{X_{(1)}}{Y_1}+\frac{X_{(2)}}{Y_2}##, whose CDF I want to calculate.

What is your machine specifications on which you ran the code, if I may ask? It takes way too long on my machine, which isn't a good news, because I need to evaluate this for different values of z, which will be much larger than 1!

EngWiPy · Sep 3, 2018

I derived a semi-analytical result (as far as I could) for the case ##M=2##, and it turned out that the numerical evaluation is correct as presented in post #37, and the simulation is not. In the simulation I should wrote

Python:

if np.sum(Z[0][:M]) <= 1:

instead of

Python:

if np.sum(Z[:M]) <= 1:

So, my only mistake was not resetting ux for each uy value. Thanks @SlowThinker for pointing that out. I don't know how I missed it.

The programs is still running too slow, but for dx = delta = 0.01 it gives close results to the simulation (numerical is 0.33, and MC simulations is 0.31). I will change to Simpson's rule for more optimization.

SlowThinker · Sep 4, 2018

EngWiPy said:

The programs is still running too slow, but for dx = delta = 0.01 it gives close results to the simulation (numerical is 0.33, and MC simulations is 0.31).

Interesting. For me, the G(1,1,0) call returned 0.333 but multiplying by 3!/(3-1)! changed it to 2.00 .
BTW my computer is i7 @ 2.3GHz. RAM or number of cores don't really matter here.

EngWiPy · Sep 4, 2018

SlowThinker said:

Interesting. For me, the G(1,1,0) call returned 0.333 but multiplying by 3!/(3-1)! changed it to 2.00 .
BTW my computer is i7 @ 2.3GHz. RAM or number of cores don't really matter here.

I changed the code a little in the C++ version compared to the Python version, but the Python version should work also, but slower. I will check the latter in C++ and see.

I think the number of cores doesn't matter much, but I think the RAM makes a difference. At my my office I have i7 with 8 GB RAM, and it is much faster than my laptop for the same code, which is i5 @2.4 GHz and 6 GB RAM.

EngWiPy · Sep 4, 2018

The C++ code corresponding to the Python version for dx=dy=0.01 gave a result of 0.302535, and it took around an hour on my laptop! This is the code

Code:

#include <iostream>
#include <cmath>

using namespace std;

const int K (3);
const int M (2);
const double dx (0.01);
const double dy (0.01);
const double y_upper_lim (7);

double F(double x){
    return exp(-x * (K - M));
}

double f(double x){
    return exp(-x);
}

double G(double z, int m, double x_prev_lim){
    double res (0);
    double uy = 0;
    double ux;
    //double dy;
    while (uy < y_upper_lim){
        uy += dy;
        ux = x_prev_lim;
        while (ux < uy * z){
            ux += dx;
            if (m == M){
                res += F(ux) * f(ux) * f(uy) * dx * dy;
            }else{
                res += G(z - (ux/uy), m+1, ux) * f(ux) * f(uy) * dx * dy;
            }
        }
    }
    return res;
}

int fact (int A){
     if (A == 0) {
         return 1;
     }
     return A*fact(A-1);
}int main(){
  
    cout << (fact(K) / fact(K-M)) * G(1, 1, 0)<<endl;

    return 0;
}

Maybe you multiplied K!/(K-M)! twice!

EngWiPy · Sep 4, 2018

SlowThinker said:

Interesting. For me, the G(1,1,0) call returned 0.333 but multiplying by 3!/(3-1)! changed it to 2.00 .
BTW my computer is i7 @ 2.3GHz. RAM or number of cores don't really matter here.

Also it should be 3!/(3-2)! = 6.

Mark44 · Sep 6, 2018

EngWiPy said:

I haven't followed this thread for a while, but if you're dealing with somewhat large numbers, the recursive function you're using is not efficient, and might be the reason your code takes so long to run. In addition, the int return value will overflow for 13! and larger values of the factorial argument.

A better (because it's non-recursive), version follows.

C:

int fact (int n){
   if (n == 0 || n == 1) {
       return 1;
   }
   int prod = 1;
   for (int i = n; i > 1; i--)
       prod *= i;
   return prod;
}

EngWiPy · Sep 6, 2018

Mark44 said:
I haven't followed this thread for a while, but if you're dealing with somewhat large numbers, the recursive function you're using is not efficient, and might be the reason your code takes so long to run. In addition, the int return value will overflow for 13! and larger values of the factorial argument.

A better (because it's non-recursive), version follows.
C:
int fact (int n){
   if (n == 0 || n == 1) {
       return 1;
   }
   int prod = 1;
   for (int i = n; i > 1; i--)
       prod *= i;
   return prod;
}

I usually have small K and M, that are <= 5. The main reason why the code is slow is because of the large number of recursions for the G function. The function fact is called only twice to compute K! and (K-M)!.

By the way, can I parallelize the C++ code using multi threading?

Mark44 · Sep 6, 2018

EngWiPy said:

I usually have small K and M, that are <= 5. The main reason why the code is slow is because of the large number of recursions for the G function.

If there's some way to rewrite G() so that it's not recursive, that likely would speed up your program.

EngWiPy said:

By the way, can I parallelize the C++ code using multi threading?

Sure, if you can figure out what each thread needs to do, and that the results from each thread don't rely on results from other threads.

EngWiPy · Sep 6, 2018

Mark44 said:

If there's some way to rewrite G() so that it's not recursive, that likely would speed up your program.

Sure, if you can figure out what each thread needs to do, and that the results from each thread don't rely on results from other threads.

I want to evaluate the above function for different values of z. They are independent of each others, but in a for loop I need to do this sequentially.

Mark44 · Sep 6, 2018

EngWiPy said:

I want to evaluate the above function for different values of z. They are independent of each others, but in a for loop I need to do this sequentially.

If the result from each loop iteration depends on the value from the previous iteration, I don't see how you can do things with multiple threads.

EngWiPy · Sep 6, 2018

Mark44 said:

If the result from each loop iteration depends on the value from the previous iteration, I don't see how you can do things with multiple threads.

EngWiPy said:

I want to evaluate the above function for different values of z. They are independent of each others, but in a for loop I need to do this sequentially.

I need to evaluate say G(1,1,0), G(2,1,0) ... G(10,1,0). These function are independent of each others. But in for loop these will look something like

Code:

for(int i=1; i <= 10; i++){
cout  <<  G(i, 1,0);
}

which are evaluated sequentially. These can be evaluated in parallel. But how? I am not familiar with multithreading, but I think we can get Kx speed up, where K is the number of cores.

Mark44 · Sep 6, 2018

EngWiPy said:
I need to evaluate say G(1,1,0), G(2,1,0) ... G(10,1,0). These function are independent of each others. But in for loop these will look something like
Code:
for(int i=1; i <= 10; i++){
cout  <<  G(i, 1,0);
}
which are evaluated sequentially. These can be evaluated in parallel. But how? I am not familiar with multithreading, but I think we can get Kx speed up, where K is the number of cores.

My computer, which I've had for five years, has an Intel Quad Core Duo processor, with each of the four cores having effectively two processors. This means that eight threads can be running. You're not going to have K cores, with K being an arbitrary integer. Your program also can't be using all the cores, as some of them will be used by the OS and other processes.

If I were going to write the code, I wouldn't have it iterate 10 times. Instead, I would iterate 4, or 8, or some power of 2, times, and I wouldn't use a loop. If I had four threads to play with, I would create four thread objects to evaluate G(1, 1.0), G(2, 1.0), G(3, 1.0), and G(4, 1.0). When the last one of these is finished, I would use the threads to evaluate G with a first argument of 5, 6, 7, and 8.

Also, I wouldn't output them in the loop. Instead, I would store the return values from G() in an array, and print them out when the array is full.

I can't say too much more about threads and such, since there's quite a bit more than I have explained. Threads are pretty low level. At a higher level of abstraction are async objects, which are easier to work with.

Numerical Integration in Python

Similar threads

Hot Threads

Recent Insights