Finding the Uncertainty of the Slope Parameter of a Liner Regression

In summary: I don't think the equation you gave is for ordinary least squares regression. It's for "total least squares" regression, which assumes no error in the measurement of the x_i .
  • #1
richardc
7
1
Finding the Uncertainty of the Slope Parameter of a Linear Regression

Suppose I have measurements [itex]x_i \pm \sigma_{xi}[/itex] and [itex]y_i \pm \sigma_{yi}[/itex] where [itex]\sigma[/itex] is the uncertainty in the measurement. If I use a linear regression to estimate the value of [itex]b[/itex] in [itex]y=a+bx[/itex], I'm struggling to find a straightforward way to compute the uncertainty of [itex]b[/itex] that arises from the measurement uncertainties. This seems like it should be a very common problem, so I'm not sure why I can't find a simple algorithm or formula.

Thank you for any advice.
 
Last edited:
Physics news on Phys.org
  • #2
Are you using "uncertainty" to mean "standard deviation"?

It's a common problem, but it's not simple. After all, your data gives only one value for [itex] b [/itex], so how can you estimate the standard deviation of [itex] b [/itex] from a sample of size 1 ?

The common way to get an answer is to oversimplify matters and compute a "linearized asymptotic" estimate. The value of [itex] b [/itex] is some function [itex] F [/itex] of the [itex] (x_i,y_i) [/itex]. Let [itex] L [/itex] be the linear approximation for the function [itex] F [/itex]. Assume that near the observed values in the sample that this well approximates the random variable [itex] b [/itex] as a linear combination of the [itex] x_i [/itex] and [itex] y_i[/itex]. When you have a random variable expressed as linear combination of other random variables, you can work on expressing its standard deviation in terms of the standard deviations of the other random variables.

That's the general picture. If it's what you want to do then we can try to look up the specifics. I don't know them from memory.
 
  • #3
Thank you for clarifying the problem.

With N observation pairs I believe I can write [itex]b=\frac{N \sum x_i y_i - \sum x_i \sum y_i}{N \sum x_i^2 - (\sum x_i)^2}[/itex].

I suppose the propagation of error formula [itex]\sigma_f^2=\sum (\frac{\partial f}{\partial x_i} \sigma_{x_i} )^2[/itex] is then applied to a linear approximation of b?
 
  • #4
You state a problem where there is an error in measurement for [itex] x_i [/itex] as well as for [itex] y_i [/itex]. In such a problem, people often use "total least squares" regression. I think the computation of the slope in "total least squares" regression is different than in ordinary least square regression, which assumes no error in the measurement of the [itex] x_i [/itex]. I think the formula you gave for [itex] b [/itex] is for ordinary least squares regression.

Of course, one may ask the question: If I fit a straight line to data using the estimator for slope used in ordinary least squares regression and my data also has errors in the [itex] x_i [/itex] then what is the standard deviation of this estimator. If that's the question, you need terms involving [itex] \frac{\partial f}{\partial y_i} \sigma^2_{y_i} [/itex] and [itex]\frac{\partial f}{\partial x_i} \sigma^2 x_i [/itex]

I don't know if the estimator for slope in ordinary least squares regression is an unbiased estimator if there are errors in the [itex] x_i [/itex].
 
  • #5


I understand your frustration with finding a straightforward way to compute the uncertainty of the slope parameter in a linear regression. However, this is a common problem in statistics and there are several methods that can be used to estimate the uncertainty of the slope parameter.

One approach is to use the standard error of the slope, which is calculated by dividing the standard deviation of the residuals (the difference between the observed values and the predicted values from the regression line) by the square root of the sum of squares of the differences between the x-values and their mean. This can be calculated using statistical software or by hand.

Another method is to use bootstrapping, which involves resampling the data multiple times and calculating the slope parameter for each sample. The uncertainty of the slope can then be estimated from the distribution of these calculated values.

Additionally, if your data follows a normal distribution, you can use the t-distribution to calculate a confidence interval for the slope parameter. This interval will give you a range of values within which the true slope parameter likely falls.

Overall, it is important to consider the assumptions of your data and the appropriate method for estimating the uncertainty of the slope parameter in your specific case. I recommend consulting with a statistician or using reliable statistical software to ensure accurate and precise results.
 

Related to Finding the Uncertainty of the Slope Parameter of a Liner Regression

What is a linear regression?

A linear regression is a statistical method used to model the relationship between two variables, where one variable is dependent on the other. It involves finding the line of best fit that minimizes the distance between the actual data points and the predicted values on the line.

What is the slope parameter in a linear regression?

The slope parameter in a linear regression is the value that represents the change in the dependent variable for every unit change in the independent variable.

Why is it important to find the uncertainty of the slope parameter?

Finding the uncertainty of the slope parameter allows us to determine the reliability of the estimated slope. It helps us understand the range of values that the slope could potentially take on, and therefore, the potential variability in the relationship between the variables.

How do you find the uncertainty of the slope parameter?

To find the uncertainty of the slope parameter, we first calculate the standard error of the slope. This is done by dividing the standard deviation of the residuals (the differences between the actual data points and the predicted values) by the square root of the sum of squares of the differences between the independent variable and its mean. The uncertainty of the slope parameter is then equal to the standard error multiplied by the critical t-value for the desired confidence level.

What is the significance of the uncertainty of the slope parameter?

The uncertainty of the slope parameter is used to calculate the confidence interval for the slope. This interval provides a range of values in which we can be confident that the true slope will fall within a certain percentage of the time. It is also used to determine the statistical significance of the relationship between the variables, as a larger uncertainty may indicate a weaker or non-significant relationship.

Similar threads

  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
858
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
Back
Top