How do I apply Chain Rule to get the desired result?

In summary, the directional derivative in direction u is the derivative of the function f with respect to alpha, evaluated at alpha=0. Using the chain rule, we can see that \frac {\partial}{\partial \alpha} f(x+alpha u) evaluates to \mathbf u^\intercal \nabla_\mathbf x f(\mathbf x) when alpha=0.
  • #1
bwest121
5
1
I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$
 
Physics news on Phys.org
  • #2
You have used the chain rule on the (wrong) form df/dx = (df/dx)(dy/dx). The chain rule is df/dx = (df/dy)(dy/dx). If you have several variables y you get a sum over the variables and the derivatives of f will be the partial derivatives.
 
  • #3
bwest121 said:
I'm reading a textbook that says:

"The directional derivative in direction ##u## is the derivative of the function ##f( \mathbf x + \alpha \mathbf u)## with respect to ##\alpha##, evaluated at ##\alpha=0##. Using the chain rule, we can see that ##\frac {\partial}{\partial \alpha} f( \mathbf x + \alpha \mathbf u)## evaluates to ##\mathbf u^\intercal \nabla_\mathbf x f(\mathbf x)## when ##\alpha = 0##."

I understand that the directional derivative is the dot product of the gradient function and the direction vector. However, I don't fully see how to get the result through using the chain rule.

Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$

I know that ##\frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha} = \mathbf u## either by applying the limit definition of the derivative or by decomposing the ##(\mathbf x + \alpha\mathbf u)## vector and applying ##\frac{\partial}{\partial\alpha}## to each component, thereby eliminating the components of ##\mathbf x## and leaving only ##\mathbf u##. Thus, I'll be dotting ##\mathbf u## with ##\frac {\partial f}{\partial \alpha}## ie; ##\mathbf u^\intercal \frac {\partial f}{\partial \alpha}.## However, how does $$\frac {\partial f}{\partial \alpha} = \nabla_\mathbf x f(\mathbf x)?$$

The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.
 
  • Like
Likes Stephen Tashi and bwest121
  • #4
bwest121 said:
Here's my attempt:
$$\frac {\partial}{\partial \alpha} f(\mathbf x + \alpha\mathbf u) = \frac {\partial f}{\partial \alpha} \cdot \frac {\partial (\mathbf x + \alpha\mathbf u)}{\partial \alpha}$$
This is wrong. It is not $$\frac {\partial f}{\partial \alpha} $$
The simple, one variable version is df/dx = df/du * du/dx. Notice the df/du rather than df/dx.
 
  • #5
PeroK said:
The main issue is your understanding of a partial derivative. A scalar function of a vector ##\mathbf{x}## is actually a function of three variables ##f(x, y, z)##. Now, for each of these variables, you can take the partial derivative wrt that variable leaving the others fixed. The result is another function of the three variables. There are various notations for these functions, but normally it's ##f_x, f_y, f_y## or ##\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}##.

Both these notations create something of a problem (that is rarely discussed, I feel). They tie the definition of these partial derivative functions to a particular choice of variable. And, if you start changing variables in some way, it can be difficult to understand what the partial derivatives actually mean.

There are two alternatives that make things clearer. With ##f## defined as a function of ##(x, y, z)##, then:

##f_x = \frac{\partial f}{\partial x} = ## "the partial derivative of ##f## wrt its first argument", which could be written ##f_1##, say.

Now, if you defined a function ##g(x, y, z) = f(x^2, 2xy, x+z)##, then what is ##g_x##?

The solution is to see the chain rule as:

##g_x = ## "the partial derivative of ##f## wrt its first argument times the partial derivative of its first argument with respect to ##x##" + "the partial derivative of ##f## wrt its second argument times the partial derivative of its second argument with respect to ##x##" + "the partial derivative of ##f## wrt its third argument times the partial derivative of its third argument with respect to ##x##".

Now, in my new notation this is quite clear:

##g_x = f_1 2x + f_2 2y + f_z##

Or, in the more usual notation this is:

##g_x = f_x 2x + f_y 2y + f_z##

I think this is worth remembering as it can be very useful in cleariungh up any confusion over pd's.

Finally, how I would analyse your example is, with ##\mathbf x## and ##\mathbf u## fixed, we define:

##g(\alpha) = f(\mathbf x + \alpha \mathbf u) = f(x + \alpha u_x, y + \alpha u_y, z + \alpha u_z)##

And:

##\frac{dg}{d \alpha} = f_x u_x + f_y u_y + f_z u_z = \mathbf{ \nabla}f \cdot \mathbf{u}##

And, as you want the derivative evaluated at ##\mathbf x = (x, y, z)## you take ##\alpha = 0##.
Thank you so much. I very much appreciate you taking the time to provide such a thorough explanation. :)
 

Related to How do I apply Chain Rule to get the desired result?

1. What is the Chain Rule and why is it important in mathematics?

The Chain Rule is a formula used in calculus to find the derivative of composite functions. It allows us to break down complex functions into smaller, more manageable parts and find the rate of change for each individual part. It is important because it is used in many real-world applications, such as finding the velocity of moving objects or the growth rate of populations.

2. How do I identify when to use the Chain Rule?

The Chain Rule is used when you have a function within another function. In other words, when you have a composite function, where the output of one function becomes the input of another. You can also think of it as a function within a function within a function, and so on.

3. How do I apply the Chain Rule step-by-step?

Step 1: Identify the inner and outer functions in the composite function.Step 2: Take the derivative of the outer function, treating the inner function as a variable.Step 3: Multiply by the derivative of the inner function.Step 4: Simplify the resulting expression, if possible.

4. Can the Chain Rule be applied to any type of function?

Yes, the Chain Rule can be applied to any type of function, as long as it is a composite function. This includes polynomial functions, trigonometric functions, exponential functions, and more.

5. How do I know if I have applied the Chain Rule correctly?

You can check if you have applied the Chain Rule correctly by plugging in values for the original and derived functions and comparing the results. You can also take the second derivative of the original function and see if it matches the result of applying the Chain Rule.

Similar threads

Replies
3
Views
1K
Replies
3
Views
1K
Replies
1
Views
4K
Replies
4
Views
388
  • Advanced Physics Homework Help
Replies
1
Views
926
  • Calculus and Beyond Homework Help
Replies
1
Views
591
  • Calculus
Replies
5
Views
1K
  • Calculus
Replies
1
Views
945
  • Classical Physics
Replies
2
Views
896
Back
Top