Show Hessian is positive definite

In summary: The Hessian of f at a is then just the product of the Hessians of these two functions:\begin{split}Hessian(f,a)&=\begin{bmatrix}a_0&a_1&...&a_n\\x_0&x_1&...&x_n\\&\int_0^1 [g(x_0) - g(x_1) - g(x_2) -...-g(x_n)]dx\\\end
  • #1
kingwinner
1,270
0

Homework Statement


Consider the function f(a)=
1
∫ [g(x)-(anxn+an-1xn-1+...+a0)]2 dx
0
where a=(a0,a1,...an) and g is some known function defined on [0,1].

From this, we can show that
optim1.JPG


Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. vT (Hessian) v >0 for all v E Rn+1, v≠0.)
When I multiply out the vT (Hessian) v, it just doesn't seem clear to me at all that it is >0.

Homework Equations


N/A

The Attempt at a Solution


Shown above.

Any help is appreciated!
 
Physics news on Phys.org
  • #2
kingwinner said:

Homework Statement


Consider the function f(a)=
1
∫ [g(x)-(anxn+an-1xn-1+...+a0)]2 dx
0
where a=(a0,a1,...an) and g is some known function defined on [0,1].

From this, we can show that
optim1.JPG


Thus, the Hessian of f at a = [2/(j+k+1)] j=0,1,2,...n; k=0,1,2,...,n.

Fact: This Hessian matrix is positive definite.

Now how can we prove that this (n+1) x (n+1) matrix is positive definite? (i.e. vT (Hessian) v >0 for all v E Rn+1, v≠0.)
When I multiply out the vT (Hessian) v, it just doesn't seem clear to me at all that it is >0.

Homework Equations


N/A

The Attempt at a Solution


Shown above.

Any help is appreciated!

You can do it indirectly: for each fixed x, the function [itex]F_x(\vec{a}) = [g(x) - a_0 - a_1 x - ... - a_n x^n]^2[/itex] is a strictly convex function of [itex]\vec{a}[/itex]; you can either get this by looking at the Hessian of [itex]F_x[/itex] in [itex]\vec{a}-[/itex]space, or establish it directly from the definition of strict convexity (i.e., show directly that [itex] F_x(\lambda \vec{a} + (1-\lambda) \vec{b}) < \lambda F_x(\vec{a}) + (1-\lambda) F_x(\vec{b}) [/itex] for [itex] \vec{a} \not= \vec{b} \mbox{ and all } 0 < \lambda < 1.[/itex] Strict convexity is preserved by integrating (again, easy to show from the above inequality), so your [itex] f(\vec{a})[/itex] is strictly convex and quadratic in [itex] \vec{a}.[/itex] Then, by a standard theorem, the Hessian of f is positive definite.

RGV
 
  • #3
(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[Mjk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show vT (Hessian) v >0 for all v E Rn+1, v≠0.

When I multiply out the vT (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. vT A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the vT A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.
 
Last edited:
  • #4
kingwinner said:
(I've just found some typos and realize some formatting error in my previous post (post #3), but for some reason I'm unable to edit it now (weird?!) So sorry I have to re-post it again. Moderator if you see this message, please delete my post #3)

Mod note: Done

But the problem is acutally I'm trying to demonstrate that the converse of your theorem is true in this example. I want to show that the Hessian matrix of f is positive definite directly, and hence conclude it is strictly convex.

In this example, the Hessian of f at a is an (n+1) x (n+1) matrix M given by
[Mjk]=[2/(j+k+1)] for j=0,1,2,...n; k=0,1,2,...,n.
I need to show vT (Hessian) v >0 for all v E Rn+1, v≠0.

When I multiply out the vT (Hessian) v, I will get some sum of squares, BUT then I will also have a whole bunch of cross terms, how can I show that it is >0?

I'm also pretty confused generally about how to use the definition of positive definiteness (i.e. vT A v >0) to actually SHOW a matrix is positive definite. How can we actually prove those things? When I multiply out the vT A v, I will always get a sum of squares plus many cross terms (unless the matrix A is a diagonal matrix), and it's not at all obvious whether the result is >0 or not.

Thanks for helping.

Here is an partial approach that gets at the essence of the problem.

Note that your f(a) and the simpler function [itex] g(a) = \int_0^1 [a_0 + a_1 x + \cdots + a_n x^n]^2 dx [/itex] have the same a-Hessian, so it is enough to look at g(a). Now, let us replace the integration by a summation of more than n+1 terms:
[tex] g_N(a) = \sum_{j=1}^N w_j [a_0 + a_1 x_j + \cdots + a_n x_j^n]^2, \; (\mbox{all }w_j > 0),[/tex]
and note that each term in the summation is a convex function of the a_k (but not strictly convex). The jth term vanishes on a hyperplane [itex] <p_j,a> = 0,[/itex] where [itex] p_j = (1, x_j, x_j^2, \ldots, x_j^n).[/itex] If we have N > n+1 terms we cannot have all N terms vanishing for any nonzero a, because the vectors {p_j} are linearly independent: the determinant of (n+1) of the p_j is a Vandermonde determinant, which is non-vanishing for distinct x_j. That means that there is no nonzero vector lying on more than n+1 of the hyperplanes, and that means that the summation from 1 to N does not vanish for any nonzero vector a. That makes g_N(a) a positive-definite quadratic form (that is, a strictly convex function).

Now the clincher: the integrand q(x) of g(a) is a polynomial of degree 2n in x, so there is a scheme that renders the approximation g_N(a) *exact*; for example, a Gauss-Legendre formula of high order will have the form [itex] \sum w_j q(x_j) \;\mbox{ with all } w_j > 0[/itex] and will evaluate g(a) exactly.

RGV
 
Last edited:
  • #5
hmm...I think your method only works for summations, and sorry I don't have background in Gauss-Legendre formula. My book also doesn't assume advanced math background, so I think this problem should be solvable without that...but thanks anyways...

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

, is this going to help us to prove positive definiteness of the Hessian M?

vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

Is there any way to show that this is >0?

Thanks.
 
  • #6
Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

Then vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

= 2 * ∫(vn xn +...+v1x+v0)2 dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?
 
  • #7
kingwinner said:
Actually I think I've found a method of prove positive definiteness of the Hessian of f.

If I write my Hessian matrix (call this matrix M) entries back in terms of an integral:

2/(j+k+1) =

1
∫ 2 xj xk dx
0

Then vT M v =
----1
Ʃ Ʃ [∫ 2 xj xk dx] vj vk
j k 0

= 2 * ∫(vn xn +...+v1x+v0)2 dx (the integral is over [0,1])
>0 for all non-zero v.

So this works!

But this idea of writing the expression back in terms of an integral looks very tricky to me. How come we can't prove positive definiteness when we write out the entries of the Hessian matrix as [2/(j+k+1)]? They should, in principle, be the same, right? Where is the loss of information occurring?

Sometimes a result about one thing is obtained most easily (if at all) by examining a seemingly-different result or system or method, or whatever. That is common, and you get used to it after a while. This is such a case.

There is another way you could see the result, but it still involves integration: change to an orthonormal set of polynomials on [0,1]: [itex]u_i[/itex] is a polynomial of degree i and [itex]<u_i,u_i> = 1, \;<u_i,u_j> = 0 \mbox{ for } i \not= j.[/itex] Here, [itex]<u,v>[/itex] means [itex] <u,v> = \int_0^1 u(x)v(x) dx.[/itex] We can obtain [itex]u_0, u_1, ... u_n[/itex] from [itex]p_0 =1, p_1 = x, p_2 = x^2, \ldots, p_n = x^n[/itex] using Gram-Schmidt orthogonalization. Therefore, in principle we can get the constant coefficients [itex]\{c_{ij}\}[/itex] and [itex]\{e_{ij}\}[/itex] giving [itex]u_i(x) = \sum_j c_{ij} p_j(x)[/itex] and [itex]p_i(x) = \sum_j e_{ij }u_j(x).[/itex]. Your quadratic form [itex]Q = (1/2)a^T H a = \int_0^1 q(x)^2 dx, [/itex] where [itex]q(x) = \sum_i a_i p_i(x).[/itex] Writing, instead, [itex]q(x) = \sum_j b_j u_j(x), [/itex] we get [itex]Q = b_0^2 + b_1^2 + b_2^2 + \cdots + b_n^2.[/itex] Since each b_i is a linear combination of the a_j, with known coefficients, we end up with Q as a sum of squares of linear functions of the a_j, as you wanted originally.

RGV
 
  • #8
I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rn+1, f is strictly convex on Rn+1. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.
 
  • #9
kingwinner said:
I see. I also think in lower dimensions, completeting the square may be helpful to show positive definiteness.

In this example, there is exactly one local minimum.
By positive definiteness of the Hessian of f at all a E Rn+1, f is strictly convex on Rn+1. This implies that every local minimum must be a global minimum. So I know that the local minimum must be a global minimum. But there could be more than one global minimum?

Claim: the global minimum point is unique.

How can uniqueness of the global minimum be proved/justified in this example?

Thanks.

The global minimum is unique for a strictly convex function. You can get an easy contradiction if you assume otherwise.

RGV
 

Related to Show Hessian is positive definite

1. What is the Hessian matrix?

The Hessian matrix is a square matrix of second-order partial derivatives of a function with respect to its variables. It is used to determine the curvature and shape of a function's graph.

2. Why is it important to show that the Hessian is positive definite?

Showing that the Hessian is positive definite is important because it ensures that a function has a unique minimum value. This is useful in optimization problems, as it guarantees that the solution found is the global minimum and not a local minimum.

3. How do you know if the Hessian is positive definite?

A matrix is positive definite if all its eigenvalues are positive. Therefore, to show that the Hessian is positive definite, we need to find the eigenvalues of the matrix and confirm that they are all positive.

4. Can the Hessian be positive definite for all points in a function?

No, the Hessian can only be positive definite at a specific point in a function. This is because the Hessian depends on the values of the function's second-order partial derivatives at that point.

5. How is the positive definiteness of the Hessian related to convexity?

A function is convex if its Hessian is positive definite for all points in the function. This means that the function's graph is always curved upwards and does not have any local maximum values. Therefore, showing that the Hessian is positive definite is a way to prove that a function is convex.

Similar threads

  • Calculus and Beyond Homework Help
Replies
4
Views
877
  • Calculus and Beyond Homework Help
Replies
3
Views
850
  • Calculus and Beyond Homework Help
Replies
2
Views
839
  • Math POTW for University Students
Replies
1
Views
2K
  • Topology and Analysis
Replies
24
Views
2K
  • Calculus and Beyond Homework Help
Replies
6
Views
1K
  • Calculus and Beyond Homework Help
Replies
12
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
310
  • Calculus and Beyond Homework Help
Replies
1
Views
361
Back
Top