Optimization with Newton's method

In summary, the conversation discusses the use of Newton's method to solve a system of equations, which is then used in a function called SSR to find the minimum sum of square residuals. However, this method takes a long time and the speaker is looking for alternative optimization methods, such as gradient descent. They also mention the possibility of using the derivative of the SSR function with respect to K1 and K2, but are unsure of how to do so with the function Xn. They also mention the possibility of approximating the derivative using the values from Newton's method.
  • #1
fahraynk
186
6
I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Right now, I solve the system by Newtons method for a given k1 and k2. I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

The problem is it takes a long time, about 40 minutes per experiment and I have maybe 100 experiments to calculate. I would rather use something like gradient decent.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?
 
Physics news on Phys.org
  • #2
fahraynk said:
I have a system of equations which I solved with Newtons method.
Call Newtons method a function NM=f(K1,K2). K1 and K2 are input and a vector of x=x1,x2,x3,x4 is output.

I have another function, SSR, the sum of square residuals. It looks like this :
$$\sum (\frac{v_0-v_1X_1-v_3X_3-2v_3X_4}{H_t})^2$$
v1 and v2 are constants, v0 and H_t are experimental values which are known for several experiments. I sum over all the experiments.

Let's name the function and state it's arguments. Do you mean ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2## ?

Right now, I solve the system by Newtons method for a given k1 and k2.
It would be better say say that you solve the system "using " a given k1 and k2 for the values x1,x2,x3,x4, if that's what you mean.

I then use a binary search or brute force to check every K1 and K2 over a range to find the minimum SSR.

It's best to state optimization problems in the standard form: That would be "Minimize ... with respect to choices of ... subject to the contraints:,,,,,".

In your case, the problem appears to be:

Minimize ##S(X1,X3,X4) = \sum_i (\frac{v_{0_i}-v_1X_1-v_3X_3-2v_3X_4}{H_{t_i}})^2##
with repect to choices of ##K_1,K_2##
subject to the contraints:
##(X_1,X_2,X_3,X_4) = NM(K_1,K_2)##
where ##NM(K_1,K_2)## is the result of solving some system of equations by Newtons method for variables ##(X_1,X_2,X_3,X_4)##.

To do gradient decent I would need to take the derivative of the SSR function with respect to K1 and K2.

$$\frac{dSSR}{dK_1}=\frac{dSSR}{dX_1}\frac{dX_1}{dK_1}+ . . . \frac{dSSR}{dX_4}\frac{dX_4}{dK_1} \\\\ \frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}$$$$

The correct notation would use partial derivatives - even though these are a pain-in-the-neck to write in LaTex.

##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

I can approximate the derivative of Newtons method with respect to K1 with NM(k1)-NM(K1+dK1)/dK1, but I have no idea if it is even possible to take the derivative of ##X_n## with respect to Newtons method!

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##
The problem is it takes a long time, about 40 minutes per experiment
Which step of the problem takes a long time? - using Newtons method? There are many tricks than can be used to speed up programs that solve particular sets of equations. For example, sometimes the final result of 2 steps of Newtons method can sometimes be implemented more concisely as a single sequence of code. Custom-made programs for solving particular sets of equations can be much faster than using a general-purpose equation solver.

Is it possible ? Or does anyone know a different way I can optimize this other than brute force and binary search?
There are quasi-random search methods, such as Simulated Annealing.

 
  • Like
Likes fahraynk
  • #3
Stephen Tashi said:
##\frac{\partial S}{\partial K_1} = \frac{\partial S}{\partial X_1} \frac{\partial X_1}{\partial K_1} + ...##

The notation "##\frac{dX_n}{dK1}=\frac{dX_n}{dNM}\frac{dNM}{dK_1}##" makes no sense in the context of your problem.

The vector valued function ##NM(K1,K2)## can be represented as 4 component functions.
##X_1 = NM_1(K_1,K_2)##
##X_2 = NM_2(K_1,K_2)##
##X3 = NM_3(K_1,K_2)##
##X4 = NM_4(K_1,K_2)##

##\frac{\partial X_1}{\partial K_1} = \frac{\partial NM_1}{\partial K_1}##

.
Thanks for your help!

Ah! I see that I made a big mistake. I think the partial of ssr should be
$$\frac{\partial{ssr}}{\partial{k_1}}=\frac{\partial{ssr}}{\partial{x_1}}\frac{\partial{x_1}}{\partial{k_1}}+ . . . \frac{\partial{ssr}}{\partial{x_4}}\frac{\partial{x_4}}{\partial{k_1}}$$
Originally I had the partial of ##x_1## with respect to NM, which was weird like you said.

Newtons method looks like this : $$x=x_{0}-G(H_t,G_t,k_1,k_2)*J(k_1,K_2)^-1$$
G is a vector consisting of 4 equations, and ##J^-1## is the inverse of the Jacobean matrix.
The output for a particular ##x_n## for ##n \in [1,4]## is a large equation, ##x_n=fn(x_1,x_2,x_3,x_4,k_1,k_2,H_t,G_t)##. I think I could either compute the derivative of this function, a giant mess, or I can approximate the derivative by taking the value of ##x_n## from Newtons method at two values for ##k_1## close together, and divide by ##dk_1##.

You asked "Which step of the problem takes a long time?"
Newtons method takes the longest, because for whatever initial guess I input, a certain percentage of the data does not converge. I ended up making a systematic method of guessing that does every possible combination of guesses, but it is a maximum of 100 guesses worse case for a choice of K1 and k2. Plus for every k1 I have to check a ton of K2 with this method, so it just takes a long time. I could try to develop better guesses, or lower the amount of guessed guesses to 50 or so, but I would rather just use gradient decent if possible because its a cooler algorithm.
 
  • #4

What is Newton's method and how does it work?

Newton's method is an algorithm used to find the roots of a function. It works by taking an initial guess for the root and then using the derivative of the function to refine the guess until it is close enough to the actual root.

What are the advantages of using Newton's method for optimization?

One advantage of Newton's method is that it converges very quickly, meaning it can find the optimal solution in fewer steps compared to other methods. Additionally, it can handle complex functions with multiple local minima/maxima, making it a versatile tool for optimization problems.

What are the limitations of Newton's method?

One limitation of Newton's method is that it requires the function to be differentiable, meaning it cannot be used on functions with discontinuities or sharp turns. Additionally, it may not always converge if the initial guess is too far from the actual root or if the function has multiple roots close together.

How do you choose the initial guess for Newton's method?

The initial guess for Newton's method should be close to the actual root for it to converge. One common approach is to plot the function and visually estimate the location of the root. Alternatively, you can use other methods such as bisection or secant method to find a good initial guess.

Are there any alternatives to Newton's method for optimization?

Yes, there are several other methods for optimization such as gradient descent, simulated annealing, and genetic algorithms. Each method has its own advantages and limitations, so it's important to choose the appropriate technique based on the specific problem at hand.

Similar threads

  • General Math
Replies
5
Views
1K
  • General Math
Replies
5
Views
848
Replies
3
Views
1K
  • Introductory Physics Homework Help
Replies
15
Views
293
  • Engineering and Comp Sci Homework Help
Replies
2
Views
1K
  • Introductory Physics Homework Help
Replies
2
Views
4K
  • General Math
Replies
1
Views
3K
  • Programming and Computer Science
Replies
1
Views
2K
  • Differential Equations
Replies
6
Views
1K
  • Differential Equations
Replies
6
Views
2K
Back
Top