Question about propagation error and linear regression?

In summary, there is a lot of jargon in this conversation that may be confusing, especially if you are not familiar with statistics and error analysis. The main ideas to keep in mind are the concepts of standard deviation, confidence interval, and standard error, which are all related to the precision and accuracy of measurements. When it comes to propagating error, it is important to understand the underlying assumptions and methods used in the specific field or software being used. It is also helpful to have a good understanding of the terminology and definitions used in the context of the discussion. Overall, clarifying these concepts and methods can help improve understanding and communication in the field of statistics.
  • #1
Justhanging
18
0
I have couple questions about this and I was hoping someone with some stats knowledge could clarify.

First, when people report numbers such as 10 plus or minus 5, what does the 5 mean? Is it the standard deviation or the confidence interval or the variance? What is the relationship between all these terms?

Secondly, when a linear regression is found on excel (or some other software) and the standard error of the slope and intercept are calculated, how do I get from this value to the plus or minus value used above. Basically, what I'm asking is how is standard error related to the standard deviation, confidence interval, or plus or minus values?

Also, how do I use the propagation of error equations? What do I use for the uncertainty in each variable?

There is a lot of jargon here that I don't really understand, can someone clarify?
 
Physics news on Phys.org
  • #2
Justhanging said:
I have couple questions about this and I was hoping someone with some stats knowledge could clarify.

First, when people report numbers such as 10 plus or minus 5, what does the 5 mean? Is it the standard deviation or the confidence interval or the variance? What is the relationship between all these terms?

That's a good question and I don't think there is a universal answer. Depending on whether you are reading the specifications for a measuring device or the results of a political poll, the conventions about what "plus or minus" means may vary.


Secondly, when a linear regression is found on excel (or some other software) and the standard error of the slope and intercept are calculated, how do I get from this value to the plus or minus value used above. Basically, what I'm asking is how is standard error related to the standard deviation, confidence interval, or plus or minus values?

To me, "standard error" means "standard deviation". The Wikipedia article on it says it means the standard deviation of a sample statistic, so I suppose if you talk about "standard error", it means you have decided to view what you are measuring as a statistic. There is the usual ambiguity about statistical terms. For example, "standard deviation" might mean 1) the standard deviation of a random variable, or 2) a specific number computed from a sample of a random variable or 3) a formula for estimating the standard deviation from the values of a sample of a random variable.

(It's also a very interesting question how Excel (or other curve fitting software) arrives at a standard deviation for the parameters of the curve that it is fitting to data!)

To get an authoritative answer, you must find out how the particular field in which you are working defines "plus or minus" as a specification of precision.

Also, how do I use the propagation of error equations? What do I use for the uncertainty in each variable?

I think someone can answer that question if you make it more specific.
 
  • #3
How do I propagate the standard error from the slope and intercept?

For example:

If I have the slope of a linear fit along with its standard error and I'm interested in a value derived from the slope how do I propagate it's error from the slope to the value of interest?

X = (c1 - slope)/(c2)

Where X is the value of interest and c's are constants. Also assuming that the constants are exact.

Hopefully I'm being clear enough but I want to propagate the error from the slope to the value of interest.
 
  • #4
Justhanging said:
How do I propagate the standard error from the slope and intercept?

For example:

If I have the slope of a linear fit along with its standard error and I'm interested in a value derived from the slope how do I propagate it's error from the slope to the value of interest?

X = (c1 - slope)/(c2)

Where X is the value of interest and c's are constants. Also assuming that the constants are exact.

Hopefully I'm being clear enough but I want to propagate the error from the slope to the value of interest.

Hey Justhanging and welcome to the forums.

There are textbooks on error analysis for data and for signals. Have you ever come across this?

The reason I mention this is that there are different models of extrapolating variance information about errors (including cumulative) which are based on whether measurements are independent or dependent. If they are independent then you get the intuitive of idea that the variance will look like a sum of small variances, but if they are dependent, then that screws things up a little.
 
  • #5
Justhanging said:
Hopefully I'm being clear enough but I want to propagate the error from the slope to the value of interest.

The only way I can interpret "propagate the error" is that for a particular value of x, you want to know the standard deviation of the random variable that is the error between the predicted value of y and an observed value of y.

If that's what you want to do, then there remains the question of whether you want to do something that make sense. To do something that makes sense, you (and I) would have to understand what Excel does.

In the first place, when you do linear regression using a software package, you generally get some output that gives you information about the distribution of the errors between the data and the regression line. If you assume all the errors are drawn independently from an identical distribution then you can probably get Excel to tell you the standard deviation of the errors and that standard deviation would apply to any prediction.

If you don't want to compute the standard deviation of the errors that way and wish instead to use Excel's value for the standard deviation of the regression coefficients, then we must figure out what exactly this standard deviation is. After all, you only have a single slope and intercept, so how did Excel get any data about the standard deviation of the slope or intercept?

In the example you gave, is it correct that you want to consider the standard devation of the slope but not the intercept? Also I don't see any provision in your equation that accounts for the fact that no regression line is perfect. You don't have any random variable that accounts for how the data deviates from a regression line.
 
  • #6
Justhanging said:
I have couple questions about this and I was hoping someone with some stats knowledge could clarify.

First, when people report numbers such as 10 plus or minus 5, what does the 5 mean? Is it the standard deviation or the confidence interval or the variance? What is the relationship between all these terms?

Secondly, when a linear regression is found on excel (or some other software) and the standard error of the slope and intercept are calculated, how do I get from this value to the plus or minus value used above. Basically, what I'm asking is how is standard error related to the standard deviation, confidence interval, or plus or minus values?

Also, how do I use the propagation of error equations? What do I use for the uncertainty in each variable?

There is a lot of jargon here that I don't really understand, can someone clarify?

In a linear regression formula based on software; you are generally dealing with "sample" deviation and not a true standard deviation. You *CAN* get an unbiased estimate of the standard deviation from a sample deviation. The formula is listed in some software I wrote, here: https://www.physicsforums.com/showthread.php?t=561799
I made small mistakes in post #1, and post #4 (I was trying to work and test the problem out as I went along) Please see post #5 for the correct formula.

I do discuss the nomenclature of +- as specified by the National Institute of Standards and Technology (NIST) which is a published specification in American (USA) common usage. So if you come across a number like 506(1), that would indicate 506 with a *standard deviation* of 1 in the last digit. That means (in this example) that the last digit (6) would be '7' or '5' or closer to '6', about 68.2% of the time.

In propagation of error equations, the typical ones, use standard deviation.
For example, 32(3) + 11(2) + 5(1) would equal {32+11+5}( √(3**2 + 2**2 + 1**2) );

The errors (standard deviations) add as if they were orthogonal axii. (Pythagorean).
If your error isn't reported in NIST format (for repeated measurements); then the other poster's comments apply -- this basic formula only works when the data is un-correlated.

In order to check if your data is un-correlated, you need to look at the residuals {eg: the difference between each data point and the "fitting" line.} Correlation is visually noticed as clusters of data, or a curvature of the data in a predictable way.

The ideal residual pattern of *uncorrelated data* is white random noise; eg: the residuals will fill a roughly rectangular area when plotted and the individual data points will "stipple" out the rectangle evenly across the entire line-fit; ( However, there will be no *other* rhyme nor reason to the data point's locations.)

Typical linear regression formulas base the slope of the line on the ratio of the sample deviation on the y axis, to that of the x axis. In software, however, other techniques such as iterative random variation are often used -- there are *many* variations on that theme. I don't even know what Excel does, myself!

For correlated data, the error propagation formulas become more complex; In the simpler form, (Pearson statistic), the correlation causes the simple sum of squares of sample deviations into an equation which is quadratic in nature. Eg: it adds product terms between each pair of squares in proportion to the Pearson "correlation" value.
(Covariance matrix values).

Error propagation (AKA numbers with uncertainties); can also take the path of choosing error bounds, which is what I think you are asking about when you say "confidence interval"; In that case, the "error bounds" are often directly added; and no assumptions are made concerning the correlation of the data. The sum of squares formula is discarded.

Caution: When multiplying two numbers with uncertainty, where each is assumed to have a standard or sample deviation; The result is *not* a normal distribution, and the deviations are in fact dealing with mildly correlated data.

This is a problem I am still trying to solve and understand well myself. I have discovered that the typical error propagation formulas for multiplication can be *quite* inaccurate depending on the magnitude of the data, and that of the variation (error).

I haven't solved that problem yet myself ... so if you learn anything useful, please pass it on... :smile:
 

Related to Question about propagation error and linear regression?

1. What is propagation error in linear regression?

Propagation error is the phenomenon where errors in the independent variable(s) of a linear regression model are propagated through to the dependent variable, resulting in a less accurate prediction of the dependent variable.

2. How does propagation error affect the accuracy of a linear regression model?

Propagation error can cause the predicted values of the dependent variable to deviate significantly from the actual values, leading to a decrease in the overall accuracy of the linear regression model.

3. What are some ways to reduce propagation error in a linear regression model?

One way to reduce propagation error is to increase the sample size of the data used for the regression analysis. Additionally, using more accurate and precise measurement techniques for the independent variables can also help reduce propagation error.

4. Can propagation error be completely eliminated in a linear regression model?

No, propagation error cannot be completely eliminated, as it is a natural consequence of using imperfect data and making assumptions in a linear regression model. However, it can be minimized through careful data collection and analysis.

5. Is propagation error the only source of uncertainty in a linear regression model?

No, there can be other sources of uncertainty in a linear regression model, such as measurement errors and model assumptions. However, propagation error is an important factor to consider as it can have a significant impact on the accuracy of the model.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
633
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
674
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
Back
Top