Estimating Variance of Normal distribution.

In summary: MLE.)All you're doing is using the standard deviation of the sample to estimate the standard deviation of the population (sqrt(2/pi) * std deviation divided by sqrt(2/Pi) is just the standard deviation), and then, presumably, squaring it to get your estimate of the variance. So, you're comparing the standard variance estimate with (n-1) in the denominator with the computation of the sample variance, which uses n in the denominator. The difference is that your formula is a biased estimate, which means that it will systematically deviate from the true value in some direction, though the difference will be negligible for large
  • #1
steviekm3
10
0
Suppose we have a normal distribution and a sample of n values from the normal distribution.

To estimate the variance we can use the standard sample variance formula ( average squared distance from the mean divided by either n ( biased estimator ) or n-1 ( unbiased estimator ) ).

There is another property about the normal distribution that possibly can be used to estimate variance and that is the property that the mean absolute deviation from the mean =
sqrt(2/pi) * std deviation

What I was wondering is that is it possible to calculate the sample mean absolute deviation from the sample mean and then divide this by sqrt(2/pi) to get an estimate for the standard deviation ? If so how does it compare with the regular formulas for estimating std deviation ?
 
Physics news on Phys.org
  • #2
All you're doing is using the standard deviation of the sample to estimate the standard deviation of the population (sqrt(2/pi) * std deviation divided by sqrt(2/Pi) is just the standard deviation), and then, presumably, squaring it to get your estimate of the variance. So, you're comparing the standard variance estimate with (n-1) in the denominator with the computation of the sample variance, which uses n in the denominator. The difference is that your formula is a biased estimate, which means that it will systematically deviate from the true value in some direction, though the difference will be negligible for large samples.

I ran a simulation just to double check, and your estimate systematically underestimates the population variance.
 
  • #3
I don't think my calculation uses the sample standard deviation formula.

For example suppose these are the numbers in series:

1,
2,
3

I would find mean of these which is 2.

and then find average of |1-2|,|2-2|,|3-2| which is 2/3

Then the estimate for std deviation of population would be (2/3)/sqrt(pi/2)
 
  • #4
I ran a simulation. Your formula radically underestimates the population variance.
We already have a minimum-variance unbiased estimator for the variance or a normal population, so there's really no need to use anything else in most situations anyway.
 
  • #5
I ran some simulations as well and I cannot see the radical underestimation of variance of population. The term radical is subjective anyhow, do you have a more quantifiable description ?
 
  • #6
steviekm3 said:
I ran some simulations as well and I cannot see the radical underestimation of variance of population. The term radical is subjective anyhow, do you have a more quantifiable description ?

The first, and most (and, really, only) important "quantifiable" description is that your estimate of the population variance is further away from the true variance than the usual estimator (sample variance with n-1 in the denominator). To give you an idea of the magnitude of the error, I drew one thousand samples of 100 from a standard normal distribution and computed the average estimate for both our estimators. Yours estimates the population SD to be 0.636, whereas the standard estimator comes out at 0.999 (1 is the correct value). More importantly, your estimate doesn't seem to converge to the true estimate, which makes it biased (it systematically deviates from the true value).

We're actually being fairly "un-rigorous" here, since estimating the population SD is fairly complicated. We have a very good (the best possible) estimator for the variance of a normal population (the usual formula, with n-1 in the denominator), but the square-root of this value is not a great estimator of the SD (though, it's pretty good in some cases).
 
Last edited:
  • #7
Hey steviekm3 and welcome to the forums.

Are you aware of the estimators used (in particular MLE) for the variance and also the properties of a good estimator (unbiased, consistent)? Also are you aware of the criteria for the best estimator (Fischer Information)?

All of these characteristics are used to not only derive an estimator, but show that under the Information criterion, that an estimator is 'optimal'.
 
  • #8
steviekm3 said:
What I was wondering is that is it possible to calculate the sample mean absolute deviation from the sample mean and then divide this by sqrt(2/pi) to get an estimate for the standard deviation ? If so how does it compare with the regular formulas for estimating std deviation ?

An interesting article on the web discusses the relative merits of the sample mean deviation vs the sample standard deviation (as estimators for their respective population parameters)http://www.leeds.ac.uk/educol/documents/00003759.htm. It gives some arguments in favor of using the mean absolute deviation when the distribution is NOT perfectly Gaussian.

(If we are going to get into dueling simulations, it would be useful if each party states whether his simulation samples from a Gaussian or some other distsribution. On a computer a nominal Gaussian this will actually be a discrete version of a truncated Gaussian.)

As Chiro has hinted, to compare formulas for estimators one needs to specify what is being compared. (Interestingly, there is no estimator for the variance of a Gaussian that is "best" by all the usual criteria for comparison. Virtualtux points this out in post #12 of the thread https://www.physicsforums.com/showthread.php?t=616643.

So far, nobody in this thread has been able to answer your question with respect to any of the well know criteria and I can't either. To make such a comparison, we also have to be specific about whether the goal is to estimate the variance or whether it is to estimate the standard deviation. - or whether the goal is estimate the distribution itself - i.e. to estimate it as a function by some of the cirtiera that are used to measure how well one function approximates another.
 
  • #9
(If we are going to get into dueling simulations, it would be useful if each party states whether his simulation samples from a Gaussian or some other distsribution. On a computer a nominal Gaussian this will actually be a discrete version of a truncated Gaussian.)

We've been discussing normal populations explicitly, so I didn't bother outlining the procedure. However, in the interest of transparency:

10000 samples of size 50 were drawn from a standard normal distribution and the population variance was estimated using the standard unbiased estimate, the OP's estimator, and the LSE and MLE (because the thread you linked to was interesting; clearly, I was wrong about the standard unbiased estimator being the best possible). The mean and variance of each estimate was as follows...

Unbiased
Mean: 0.9986
Variance: 0.0407

OP's Estimator
Mean: 0.4074
Variance: 0.0075 (!)

LSE
Mean: 0.9786
Variance: 0.0394

MLE
Mean: 0.9594
Variance: 0.0376

Mind you, this is using the square of the OP's estimate as an estimate of the variance (so that everything is estimating the same statistic). If we instead use the square root of variance estimator as an estimate of the SD (which is not ideal either, but I think is what the OP was suggesting; part of the problem is that he's comparing his estimator of the SD to an estimator for the variance), we get...

Unbiased
Mean: 0.9956
Variance: 0.0102

OP's Estimator
Mean: 0.6306
Variance: 0.0046 (!)

LSE
Mean: 0.9759
Variance: 0.0098

MLE
Mean: 0.9856
Variance: 0.0100
 
  • #10
Okay I got some more time to work on this. What I found out is I believe that a correction factor has to be added to the estimator. When I add in this correction factor the estimator should be an unbiased estimator of the standard deviation.

Here is code that compares standard estimator ( take sqrt of S^2 ) with this estimator. I get average estimator values to be around 0.999 then. The standard estimator is not as close because it is biased. But I believe adjustment factor ( of different form ) can fix standard estimator. I have not looked into how fast they converge but I'll work on this next. Note standard estimator for variance is not biased. I don't think squaring this new estimator will produce an unbiased estimator but I have to look more closely ( Jensen's inequality ).

Note be careful with coding as at first I must have had something wrong in formula as I got around 0.63 for std dev which was similar to what you got.

double totalAvg=0.0;
double totalAvg2=0.0;
size_t totalIterations=100000;
size_t sampleSize=20;
for( size_t i = 0; i < totalIterations; ++i )
{
std::vector<double> rns;
for( size_t j = 0; j < sampleSize; ++j )
{
double rn=gsl_ran_gaussian(r,1.0);// this is box-muller algorithm to find randome number from N(0,1)
rns.push_back(rn);
}
double stdStdDev=CalculateStdDev(rns.begin(),rns.end()); // this is regular std deviation estimator
double mean=CalculateMean(rns.begin(),rns.end());
double totalAbs=0;
for(size_t j =0; j < sampleSize; ++j )
{
totalAbs += fabs(rns[j]-mean);
}
double correctionFactor=(sampleSize-1)/sampleSize;
double pi=3.1459;
double stdDevEstimator= 1.0/sqrt(2*correctionFactor/pi)*(totalAbs/sampleSize);
//logStream << stdDevEstimator << COStream::endl;
totalAvg+=stdDevEstimator;
totalAvg2+=stdStdDev;
}
logStream<<"Avg 1k samples:"<<AsString(totalAvg/totalIterations,6)<<COStream::endl;
logStream<<"Avg 1k samples:"<<AsString(totalAvg2/totalIterations,6)<<COStream::endl;
 
  • #11
Stephen Tashi said:
So far, nobody in this thread has been able to answer your question with respect to any of the well know criteria and I can't either. To make such a comparison, we also have to be specific about whether the goal is to estimate the variance or whether it is to estimate the standard deviation. - or whether the goal is estimate the distribution itself - i.e. to estimate it as a function by some of the cirtiera that are used to measure how well one function approximates another.

For the particular application I'm working on, I'm looking for unbiased estimator of standard deviation. The reason is because I have sample points in which to infer the distribution. Once I have the distribution I have to run a simulation on it and the simulation generates random normal numbers. The function to generate the random normals takes standard deviation so I figure best to get estimator for standard deviation that I can feed into the generator. All this is more for interest sake as Number Nine points out that the regular formulas work great.
 
  • #12
steviekm3 said:
For the particular application I'm working on, I'm looking for unbiased estimator of standard deviation.
Are you quite sure that's what you want? The square root of the unbiased estimator of the variance is not an unbiased estimator of the s.d.
 
  • #13
haruspex said:
Are you quite sure that's what you want? The square root of the unbiased estimator of the variance is not an unbiased estimator of the s.d.

I only need std deviation because the library function that I'm using takes standard deviation as an argument. I could adjust the regular estimator for standard deviation using:

"en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation"

All of this is more for interest sake because my n is pretty large ( around 250 ). So I think with that big a sample size the bias becomes tiny.
 
Last edited by a moderator:
  • #14
Number Nine said:
I ran a simulation. Your formula radically underestimates the population variance.
We already have a minimum-variance unbiased estimator for the variance or a normal population, so there's really no need to use anything else in most situations anyway.

My apologies here, the formula should have been:

(mean absolute deviation) /sqrt(2/pi)

Then to add bias correction:

(mean absolute deviation) /sqrt(2*f/pi)
where f = (n-1)/n
 
  • #15
steviekm3 said:
My apologies here, the formula should have been:

(mean absolute deviation) /sqrt(2/pi)

Then to add bias correction:

(mean absolute deviation) /sqrt(2*f/pi)
where f = (n-1)/n

I haven't done the math on it (doing "the math" with anything involving square roots or absolute values is difficult with continuous distributions), so I can't comment on its unbiasedness or any other of its properties as an estimator. That said, it actually seems to perform pretty well under simulation.
 

Related to Estimating Variance of Normal distribution.

What is the normal distribution?

The normal distribution is a probability distribution that is often used to model real-world phenomena. It is characterized by a bell-shaped curve and is symmetric around its mean. Many natural phenomena, such as heights and weights of individuals, can be approximated by a normal distribution.

How is the variance of a normal distribution estimated?

The variance of a normal distribution can be estimated using statistical methods. One common method is to use a sample of data from the normal distribution and calculate the sample variance. This sample variance is then used as an estimate for the population variance.

What is the importance of estimating the variance of a normal distribution?

The variance of a normal distribution is an important measure of spread or variability. It can help us understand how much the data values deviate from the mean and how likely it is for a new data point to fall within a certain range. Estimating the variance allows us to make inferences and predictions about the population based on a sample of data.

What factors can affect the accuracy of estimating the variance of a normal distribution?

The accuracy of estimating the variance of a normal distribution can be affected by the size of the sample used, as well as any outliers or extreme values in the data. In addition, the underlying assumptions of the normal distribution, such as the data being normally distributed and independent, can also impact the accuracy of the estimate.

Are there any other methods for estimating the variance of a normal distribution?

Yes, there are other methods for estimating the variance of a normal distribution, such as the maximum likelihood method and Bayesian methods. These methods may be more complex but can provide more accurate estimates in certain situations.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
674
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
595
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
Back
Top