How do I apply Chain Rule to get the desired result?

bwest121 · Jan 15, 2017

I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$

Orodruin · Jan 15, 2017

You have used the chain rule on the (wrong) form df/dx = (df/dx)(dy/dx). The chain rule is df/dx = (df/dy)(dy/dx). If you have several variables y you get a sum over the variables and the derivatives of f will be the partial derivatives.

PeroK · Jan 15, 2017

bwest121 said:

I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$

The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.

FactChecker · Jan 15, 2017

bwest121 said:

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

This is wrong. It is not $$\frac {\partial f}{\partial \alpha} $$
The simple, one variable version is df/dx = df/du * du/dx. Notice the df/du rather than df/dx.

bwest121 · Jan 15, 2017

PeroK said:

The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.

Thank you so much. I very much appreciate you taking the time to provide such a thorough explanation. :)

How do I apply Chain Rule to get the desired result?

Related to How do I apply Chain Rule to get the desired result?

1. What is the Chain Rule and why is it important in mathematics?

2. How do I identify when to use the Chain Rule?

3. How do I apply the Chain Rule step-by-step?

4. Can the Chain Rule be applied to any type of function?

5. How do I know if I have applied the Chain Rule correctly?

Similar threads

Hot Threads

Recent Insights