Matrix derivative of quadratic form?

In summary: I would like to solve it assuming I don't know what the answer is to be.Assuming you don't know the answer to the equation, the derivative of f(x) is: \begin{pmatrix} \frac{\partial f}{\partial x_{1\ 1}} &\frac{\partial f}{\partial x_{1\ 2}} \\ \frac{\partial f}{\partial x_{2\ 1}} & \frac{\partial f}{\partial x_{2\ 2}} \end{pmatrix}
  • #1
perplexabot
Gold Member
329
5

Homework Statement


Find the derivative of f(X).
f(X) = transpose(a) * X * b

where:
X is nxn
a and b are n x 1
ai is the i'th element of a
Xnm is the element in row n and column m
let transpose(a) = aT
let transpose(b) = bT

Homework Equations


I tried using the product rule, which I assume is wrong.
I know the answer to be a*bT (but I have not the slightest clue how)

The Attempt at a Solution

[/B]
I tried many things, to the point where punching a whole through my screen doesn't really seem like a bad idea anymore.

My last attempt was to use the product rule along with some matrix properties, here is what I did:
d(f)/dX = [d(aT*X)/dX]*b + (aT*X)*[d(b)/dX] = [d(aT*X)/dX]*b = (d/dX)[Σai*X1i Σai*X2i ⋅ ⋅ ⋅ Σai*Xni]*b

I have no idea what to do next. I have a feeling using the product rule doesn't apply to matrices.
PLEASE HELP ME!

Thanks for reading...
 
Physics news on Phys.org
  • #2
perplexabot said:
a and b are n x 1

As an example take [itex] n = 2 [/itex]

[itex] a = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} [/itex]

[itex] b = \begin {pmatrix} b_1 \\ b_2 \end{pmatrix} [/itex]

[itex] X = \begin{pmatrix} x_{1\ 1} & x_{1\ 2} \\ x_{2\ 1} & x_{2\ 2} \end{pmatrix} [/itex]

Then [itex] f(X) = a^T X b [/itex] is a single number. ( We could say it is a 1x1 matrix.)
I know the answer to be a*bT
Then the answer would be [itex] \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \begin{pmatrix} b_1 & b_2 \end{pmatrix} [/itex] but what kind of multiplication does that represent? It can be worked as ordinary matrix multiplication to produce a 2x2 matrix.

[itex] ab^t = \begin{pmatrix} a_1b_1 & a_1b_2 \\ a_2b_1 & a_2 b_2 \end{pmatrix} [/itex]

I don't know the details of your class materials, so I must guess about how "the derivative" of f(X) is defined.

One guess is that the derivative of [itex] f [/itex] with respect to [itex] X [/itex] is:

[itex] \begin{pmatrix} \frac{\partial f}{\partial x_{1\ 1}} &\frac{\partial f}{\partial x_{1\ 2}} \\ \frac{\partial f}{\partial x_{2\ 1}} & \frac{\partial f}{\partial x_{2\ 2}} \end{pmatrix}[/itex]

Is that the definition you use?
 
Last edited:
  • #3
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
## \lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ? ##
 
  • #4
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.
 
  • #5
RUber said:
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.

He should not use any of those things; it is just a straightforward matter, like saying ##(d/dx) (cx) = c## for constant ##c##. In fact,
[tex] f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j [/tex]
 
Last edited:
  • #6
Stephen Tashi said:
Is that the definition you use?
Yes! However I would like to solve it assuming I don't know what the answer is to be.
RUber said:
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
## \lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ? ##
I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.
Ray Vickson said:
He should not use any of those things; it is just a straightforward matter, like saying ##(d/dx) (cx) = x## for constant ##c##. In fact,
[tex] f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j [/tex]
I have a couple questions about what you wrote, if I may.

##(d/dx) (cx) = x## for constant ##c## should this not be ##(d/dx) (cx) = c## for constant ##c## ?
For your equation of f(x): [tex] f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j [/tex]
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!
 
  • #7
perplexabot said:
Yes! However I would like to solve it assuming I don't know what the answer is to be.

I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.

I have a couple questions about what you wrote, if I may.

##(d/dx) (cx) = x## for constant ##c## should this not be ##(d/dx) (cx) = c## for constant ##c## ?
For your equation of f(x): [tex] f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j [/tex]
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!

Yes, it should have been ##(d/dx) (cx) = c##; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was ##a^T X b## in expanded form. And, I don't see why you ask why/how the ##x## went away; it didn't---it is still there. Perhaps you wonder where the ##x## went at the end of the displayed equation? Well, when I said ##c_{ij} = a_i b_j##, that was just the definition of ##c_{ij}##. In other words, I wrote the sum with a ##c_{ij}## in it, so I have to define ##c_{ij}## somewhere. Perhaps I should have said " ... where ##c_{ij} = a_i b_j##".
 
  • #8
Ray Vickson said:
Yes, it should have been ##(d/dx) (cx) = c##; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was ##a^T X b## in expanded form. And, I don't see why you ask why/how the ##x## went away; it didn't---it is still there. Perhaps you wonder where the ##x## went at the end of the displayed equation? Well, when I said ##c_{ij} = a_i b_j##, that was just the definition of ##c_{ij}##. In other words, I wrote the sum with a ##c_{ij}## in it, so I have to define ##c_{ij}## somewhere. Perhaps I should have said " ... where ##c_{ij} = a_i b_j##".

sorry! my last question is wrong. I read your equation as f(X) = aibj, so it is my fault.
Ok. I think I understand your equation then.

But what next? Product rule and chain rule? Or do I simply take the derivative of ##c_{ij}x_{ij}## with respect to ##x_{ij}##? If i do the latter procedure, I just get the sum of ##c_{ij}## terms.
EDIT: Actually I am wrong once again! You don't get the sum of ##c_{ij}##. You get a column vector with each row being a derivative of ##c_{ij}x_{ij}## with respect to an ##x_{ij}##, right?

Thank you for your patience : )
 
  • #9
I finally was able to do this. I was trying to solve it without considering the elements of the matrix, when i think that is not possible. Here is my solution, for anyone that may be interested in the future. Thanks for the help from everyone.

gotIt.png
 

Related to Matrix derivative of quadratic form?

1. What is a quadratic form in matrix notation?

A quadratic form in matrix notation is a mathematical expression that involves multiplying a vector by a matrix and then multiplying the resulting vector by its transpose. It can be described as a polynomial function of the variables in the vector, with coefficients given by the elements of the matrix.

2. Why is the derivative of a quadratic form important?

The derivative of a quadratic form is important because it allows us to find the rate of change of the quadratic form with respect to the variables in the vector. This can be useful in optimizing the quadratic form or finding critical points.

3. How do you find the derivative of a quadratic form?

To find the derivative of a quadratic form, we can use the properties of matrix differentiation. The derivative will be a matrix with the same dimensions as the original matrix, where each element is the derivative of the corresponding element in the original matrix.

4. Is the derivative of a quadratic form always a symmetric matrix?

Yes, the derivative of a quadratic form is always a symmetric matrix. This is because the order of differentiation does not matter for symmetric matrices, and the derivative is calculated by differentiating each element of the original matrix.

5. Can the derivative of a quadratic form be used to find the minimum or maximum value?

Yes, the derivative of a quadratic form can be used to find the minimum or maximum value. This is because the minimum or maximum of a quadratic form occurs when its derivative is equal to zero, and this value corresponds to a critical point of the function.

Similar threads

  • Calculus and Beyond Homework Help
Replies
3
Views
1K
  • Calculus and Beyond Homework Help
Replies
28
Views
2K
  • Calculus and Beyond Homework Help
Replies
5
Views
959
  • Calculus and Beyond Homework Help
Replies
24
Views
2K
  • Calculus and Beyond Homework Help
Replies
3
Views
833
  • Calculus and Beyond Homework Help
Replies
4
Views
208
  • Calculus and Beyond Homework Help
Replies
5
Views
417
  • Calculus and Beyond Homework Help
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
10
Views
7K
Back
Top