OLS regression - using an assumption as the proof?

In summary, the conversation discusses using the method of equating partial derivatives to zero to find the minimum or maximum values of a function, as well as using it to prove the mean value of residuals is zero in a regression line. This method is similar to finding the minimum sum of squared residuals in a one-sample arithmetic mean.
  • #1
musicgold
304
19
Hi,

My question is about a common procedure used to find minimum and maximum values of a function. In many problems we find the first derivative of a function and then equate it to zero. I understand the use of this method when one is trying to find the minimum or maximum value of the function.

However, I get confused when I see people using that ‘equating to 0’ assumption as a proof for something else.

To better explain my question, I have attached a file here. The file has equations used in deriving the coefficients of a least-square regression line.

The OLS method starts with the partial differentiation of equation 3.1.2, and then equates the derivatives to 0 and solves them to get the coeff. I get it up to this point.

However, in the last section, to prove that the sum of the residuals is 0, the author uses terms from partial differentiation as the proof.

I don’t understand how an assumption can be used as the proof for something.

Thanks,

MG.
 

Attachments

  • regression eqn.doc
    47 KB · Views: 207
Physics news on Phys.org
  • #2
I looked at your attachment. I do not see any "assumption used as a proof". What "assumption" are you talking about?
 
  • #3
HallsofIvy,

Thanks.

Equation 1, in the middle section of the attachment, is a partial derivative of Eqn 3.1.2 (in the top section), with respect to β1. Then Eqn 1, along with Eqn 2, is equated to zero to get the values of β1 and β2 (estimates).

Isn’t this equating eqn 1 and 2 to zero is an assumption used only to get the values of β1 and β2.
And if that is an assumption, why is it being used to prove u =0, as in the last section of the attachment.

MG.
 
  • #4
Is there some other page where the author(s) show that "the mean value of the residuals is zero", as they state on the bottom portion of the page you attached? If not, the writing is poor.

However, nothing is really cyclic: you have

[tex]
S(\hat{\beta}_1, \hat{\beta}_2) = \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i\right)^2
[/tex]

and you want to find the [tex] \hat{\beta}_1, \hat{\beta}_2 [/tex] pair that minimizes it. Since it is a very nice function (polynomial in two variables), the usual calculus-based methods can be used to find them. The first steps are to find the two partial derivatives, set them to zero, and solve - exactly what is discussed. Setting the partial derivatives to zero gives
[tex]
\begin{align*}
\frac{\partial S}{\partial \hat{\beta}_1} & = -2 \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i \right) = 0 \\
\frac{\partial S}{\partial \hat{\beta}_2} & = -2 \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i\right) X_i = 0
\end{align*}
[/tex]

The first equation of my final pair shows that the sum (and so the mean) of the residuals is zero, and the final equation corresponds to 3A.1 equation 2.
 
Last edited:
  • #5
statdad,

Thanks.

statdad said:
Is there some other page where the author(s) show that "the mean value of the residuals is zero",

I think that is what the author is trying to prove here ( see the last line of my attachment)

Now back to my question.


Let me explain you what I understand here first.

By equating those partial derivate terms to zero, we are looking at the points where the surface or function reaches a maximum or minimum with respect to β1 and β2, and I understand that. Basically, we are using two known points on the function to find the value of two unknowns.

What I don’t understand is how we can use the same method (of equating a differential equation to zero) to prove that the mean value of residuals is zero.

I guess, I am not able to interpret this geometrically. What are we trying to say here: the point where the partial derivative w.r.t β1 is zero, the expected value of the residual term is also zero, and therefore, it is zero at all other points on the surface? (not sure if I am making any sense here)

Thanks,

MG.
 
  • #6
First, note that I had two typos in my earlier post (I have fixed them). I neglected to write [tex] X_i [/tex] in the two partial derivatives.

Now, the way this estimation is usually approached (that is, the way I learned it and the way I often teach it) is to say: OK, we have data, and we want to estimate the slope and intercept with least squares. Since every linear equation can be written in the form

[tex]
Y = a + bx
[/tex]

let's try to find the values of [tex] a, b [/tex] that will minimize this expression:

[tex]
S(a,b) = \sum_{i=1}^n \left(Y_i - (a + bX_i)\right)^2
[/tex]

- this is simply the sum of the vertical distances between the points and the line. We can find the values that minimize this with simple calculus.

[tex]
\begin{align*}
\frac{\partial S}{\partial a} & = -2 \sum_{i=1}^n \left(Y_i - (a + bX_i) \right) = 0\\
\frac{\partial S}{\partial b} & = -2 \sum_{i=1}^n \left(Y_i - (a + bX_i) \right)X_i = 0
\end{align*}
[/tex]

The solutions to these equations are found by simple algebra - these solutions are the [tex] \hat{\beta}_1 [/tex] and [tex] \hat{\beta}_2 [/tex] values. With these values, the first partial derivative above shows that

[tex]
\left. \frac{\partial S}{\partial a}\right|_{a=\hat{\beta}_1, b = \hat{\beta}_2} = \sum_{i=1}^n \left(Y_i - \hat{\beta}_1 - \hat{\beta}_2 X_i \right) = 0
[/tex]

This is the point where we see that the sum of the least squares residuals equals zero. Since the sum of the residuals is zero, the mean (as in arithmetic mean) of the sample residuals is zero.

This is similar to a property the one-sample arithmetic mean has: [tex] \bar x [/tex] has the property that

[tex]
\sum_{i=1}^n (x_i - \bar x)^2
[/tex]

is minimized, and as a consequence it is easy to show that

[tex]
\sum_{i=1}^n (x_i - \bar x) = 0
[/tex]

Did I get closer to answering your question this time?
 
  • #7
musicgold said:
What I don’t understand is how we can use the same method (of equating a differential equation to zero) to prove that the mean value of residuals is zero

It is because the beta-hat coefficients are chosen such that the first-order derivatives equal zero.

In other words: the last equation is zero precisely because the beta-hat coefficients have been chosen (i.e. solved) to satisfy equations (1) and (2), as statdat has shown.

[BTW, is this a D.E. question? I would have put it under calculus.]
 
Last edited:
  • #8
statdad and EnumaElish,

Thanks a lot. It is clear to me now.

MG.
 

Related to OLS regression - using an assumption as the proof?

1. What is OLS regression and how is it used in scientific research?

OLS (ordinary least squares) regression is a statistical method used to analyze the relationship between a dependent variable and one or more independent variables. It is commonly used in scientific research to identify patterns, make predictions, and test hypotheses.

2. What is the assumption used in OLS regression and how is it used as proof?

The assumption used in OLS regression is that the relationship between the dependent and independent variables is linear. This assumption is used as proof by demonstrating that the data fits a linear pattern, and therefore, the regression model is valid.

3. Can OLS regression be used if the assumption of linearity is not met?

No, OLS regression is not appropriate if the assumption of linearity is not met. In such cases, alternative methods, such as non-linear regression or transformation of variables, may be used.

4. What are the consequences of violating the assumption of linearity in OLS regression?

If the assumption of linearity is violated, the results of the regression analysis may be biased and unreliable. This can lead to incorrect conclusions and interpretations.

5. How can researchers ensure that the assumption of linearity is met in OLS regression?

Researchers can ensure the assumption of linearity is met by visually inspecting the data for a linear relationship, using diagnostic tests, and if necessary, transforming the variables to achieve linearity.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
964
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
586
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Differential Equations
Replies
2
Views
2K
  • Differential Equations
Replies
17
Views
908
Replies
1
Views
1K
  • Differential Equations
Replies
5
Views
710
Back
Top