Confidence interval from least squares fitting?

In summary, the conversation discusses a problem with finding the best fit theoretical distribution for experimental data and how to find a confidence interval for the parameter value. The suggested methods include using the formula for the parabola curve or using the residual errors from the curve fit. Another suggestion is to view the least squares fit as a maximum likelihood procedure and use the asymptotic variance to find the confidence interval. Alternatively, the problem can be simulated using Monte-Carlo methods to check the results.
  • #1
oaktree
1
0
Hello,

Let me get right to my problem. I have an experimental distribution and a single-parameter theoretical distribution.

I want to find the value for the best fit theoretical distribution that agrees with my experimental data for the bulk of the distributions (the tails of my distributions differ substantially).

I isolate an equal portion of both distributions and calculate the sum of the squares of the differences between the two distributions for this region. i.e. least squares approach.
R2=Ʃ [ expi - theoi(x) ]2

I do this for several values that I have chosen for the single-parameter theoretical distribution and obtain a unique parameter value (which I call x=ζ) which results in the minimization of the sum of the squares (exactly what I want). Every other parameter gives a larger value for this sum of squares.

I do not know if this is necessary but I can plot all parameters that I tested vs the R2 too. This gives points that combine to form a parabola, ax2+bx+c=R2. I can take the derivative of the parabola curve to obtain the minimum of the parabola which again occurs at ζ. I have attached a pdf of this.

My question is, how do I find a confidence interval for this? I am looking for [itex]\sigma[/itex] in ζ ± [itex]\sigma[/itex]. Do I use the formula for the parabola to find this? Do I use R2?

I have looked through some books and online and been unsuccessful on how to find this [itex]\sigma[/itex] value. Any help is appreciated. A reference book would be of great help too. Thanks!
 

Attachments

  • All Results Graph.pdf
    169.4 KB · Views: 319
Physics news on Phys.org
  • #2
oaktree said:
I do this for several values that I have chosen for the single-parameter theoretical distribution and obtain a unique parameter value (which I call x=ζ) which results in the minimization of the sum of the squares (exactly what I want).

My question is, how do I find a confidence interval for this? I am looking for [itex]\sigma[/itex] in ζ ± [itex]\sigma[/itex]. Do I use the formula for the parabola to find this? Do I use R2?

I may (or may not!) understand the usual method for getting confidence intervals for parameters of a curve fit, I've remarked about it in several posts and not gotten any corrections - we'll see if my luck holds.

it involves some commonly made and implausible assumptions. It goes something like this. The value of the parameter is a known function of the data. I don't want to call the parameter "x". Let's call the parameter p and the say
[itex] p = F(X_1,X_2,...X_n) [/itex] where the [itex] X_i [/itex] are the data.

You may not know the symbolic expression for [itex] F [/itex] , but you have a numerical method for computing it, namely your curve fitting algorithm.

Let's say that your particular curve fit found that [itex] p = p_0 [/itex] when the specific data was [itex] X_1 = x_1, X_2 = x_2,... X_n = x_n [/itex].

Find (symbolically or numerically) the differential expression that approximates a change in [itex] p_0 [/itex] as a function of changes in the [itex] x_i [/itex].[itex] p0 + \delta p = F(x_1,x_2,...) + \delta x_1 \frac{\partial _F}{\partial X_1} + \delta x_2 \frac{\partial _F}{\partial X_2} + ...[/itex]

[itex] \delta p = \delta x_1 \frac{\partial _F}{\partial X_1} + \delta x_2 \frac{\partial _F}{\partial X_2} + ...[/itex]

Think of the [itex] \delta x_i [/itex] are the random errors in measuring each of the [itex] x_i [/itex]. Assume these are independent normally distributed random variables with mean zero. Estimate their standard deviations. (You might have some way of doing this that is independent of the curve fit. I suppose some people use the "residuals" (i.e. the "error" between the values predicted by the curve and the actual data) to do this estimation. I'm not sure how that can be justified by logical reasoning! )

The above approximation expresses the random variable [itex] \delta p [/itex] a linear function of the independent mean zero normal random variables [itex]\delta x_i [/itex], which have known standard deviations. From this you can compute the standard deviation of [itex] \delta p [/itex] and say things about confidence intervals around [itex] p_0 [/itex].

The method is suspicious enough that the technical term for it is not "confidence interval", although some software packages are brazen enough to call it that. I think it is technically a "linearized confidence interval" and it may need another adjective (that I don't recall at the moment) to warn people.

If your theoretical distribution is from a commonly used family of curves, you might be able to look up the formula for the linearized confidence interval that the above method produces.
 
Last edited:
  • #3
Least squares fit can be viewed at as a special case of a maximum likelihood proceedure where you assume that the logarithmic likelihood L is - up to some factors not depending on x - R^2.
That is, the probability density function for obtaining your data exp_i given x is
[itex] p(\{exp_i\})=C \exp(-R^2) [/itex].
Now you turn the handle of general theorems about maximum likelihood to find that the asymptotic variance is [itex] \sigma^2= (\partial^2 R^2/\partial x^2)^{-1} [/itex], i.e. one over the second derivative of your parabola.
 
  • #4
Another thought: You could simply Monte-Carlo the problem. Simulate sets of data with various simulated "errors" and plot the distribution of the estimated parameter. Even if you do the problem another way, you could use a simulation to check your result.
 
  • #5


Hello,

Thank you for sharing your problem with me. It appears that you are using a least squares fitting approach to find the best fit theoretical distribution for your experimental data. This method is commonly used in scientific research to find the optimal parameters for a model that best fits the data.

To find a confidence interval for your parameter value ζ, you can use the standard error of the estimate (SEE) for your least squares fit. This can be calculated by taking the square root of the sum of squares divided by the degrees of freedom (n-p), where n is the number of data points and p is the number of parameters in your model. The confidence interval for ζ would then be ζ ± t*SEE, where t is the critical value from the t-distribution for your desired level of confidence and degrees of freedom (n-p).

I would recommend consulting a textbook on regression analysis or statistical methods for more information on calculating confidence intervals for least squares fits. Additionally, there are many online resources and software packages that can assist with this calculation. I hope this helps and good luck with your research!
 

Related to Confidence interval from least squares fitting?

What is a confidence interval from least squares fitting?

A confidence interval from least squares fitting is a range of values within which we can be confident that the true value of a population parameter lies. It is calculated using the least squares method, which is a statistical technique for determining the best fit line for a set of data points.

How is a confidence interval from least squares fitting calculated?

A confidence interval from least squares fitting is calculated using the standard error of the estimate, which takes into account the variability of the data points around the regression line. The calculation involves using the sample size, the standard deviation of the residuals, and the critical value from the t-distribution.

What does a confidence interval from least squares fitting tell us?

A confidence interval from least squares fitting tells us the range of values within which we can be confident that the true population parameter lies. It gives us a measure of the precision of our estimate and helps us determine the level of uncertainty in our data.

Why is a confidence interval from least squares fitting important?

A confidence interval from least squares fitting is important because it helps us make more accurate and reliable conclusions about the population based on a sample. It also allows us to assess the validity of our model and determine the level of uncertainty in our data.

Can a confidence interval from least squares fitting be interpreted as a probability?

No, a confidence interval from least squares fitting cannot be interpreted as a probability. It is a range of values within which we can be confident that the true population parameter lies, not a statement about the probability of obtaining a specific value. However, we can use the confidence level associated with the interval to make statements about the likelihood of the true value falling within that range.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
780
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
677
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
Back
Top