Confused about statistics terms

Will · Mar 5, 2004

I have always thought that the operation:
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(N-1))^.5
was known as the standard deviation. But now my physics text says that it is:
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(N))^.5
and another website I went to says that yes, the second equation is the correct terminology and that the first equation is the square root of the bias-corrected variance, and that the two are often confused. So what operation does the "standard deviation"
operation on my TI-89 do? and how do I do the other one?

cookiemonster · Mar 5, 2004

I've only had an introductory course in Statistics, but I was taught that the division by N corresponded to the population's standard deviation whereas the division by n-1 corresponded to the sample's standard deviation.

i.e.,

[tex]\sigma = \sqrt{\frac{\sum{(x-\mu)^2}}{N}}[/tex] will be used to find the standard deviation of a population,

and

[tex]S = \sqrt{\frac{\sum{(x-\overline{x})^2}}{n-1}}[/tex] will be used to find the standard deviation of a sample of the population.

I would assume that your calculator would assume that a list is a sample, not a population, and so would use the second equation. However, I know that the TI-83+ will give you both if you perform 1-Var Stats on the list.

Perhaps someone more educated can help you more.

cookiemonster

Damned charming :) · Mar 6, 2004

cookie monster is totally correct.
Imagine you wanted to calculate the standard deviation of peoples heights. If you had every single persons height you calculate the population standard deviation with formula is
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(N))^.5.

More realistically you would estimate the standard deviation using a sample of 100 peoples heights. In this case use the sample standard deviation formula
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(n-1))^.5
cleary sample standard deviation is the default defintion of standard deviation.

Sample standard deviation is usually denoted as "s" or "sigma (n-1)"
and population standard devitation is denoted by "sigma (n)". Your calculator certainaly would have "sigma(n-1)" button.

mathman · Mar 6, 2004

The difference between dividing by N vs. dividing by N-1 results from the fact that for the entire population, you can calculate the exact average, while for a sample you have an approximate average. When you calculate the mathematical expectation of the sample deviation (with N-1) it will equal the population deviation.

matt grime · Mar 6, 2004

People appear to be making claims here that aren't true. The N-1 version does not give you the Population S.D. It is the unbiased estimator of the population deviation - ie the best we can do for certain constraints. The unbiased estimator of population mean is the sample mean, fortunately.

Just imagine we draw two samples from the same population. From what was written above one would think there are two S.D. of the population, one coming from each sample.

cookiemonster · Mar 7, 2004

matt, would you explain the difference a bit more? I don't quite understand what you're getting at and my intro class never really got into enough detail to warrant a thorough treatment.

cookiemonster

matt grime · Mar 7, 2004

Ok, so we have a population with unknown mean and standard deviation.We take a sample from it and we want to work out some statistics. We've got the mean and the ordinary standrad deviation lying around (the one dividing by n). The question asked is 'do these form an unbiased estimate of the population mean and s.d.?'

firstly X is an unbiased esitmator of a parameter Y if E(X)=Ythe sample mean is an unbiased estimator of the population mean but if you actually work out the mean of the standard deviation it is not the population s.d..

So we use the n-1 quantity instead which in our calculation above we found coming into it as a measure of the bais of our estimate.

My nit-picking was that you gave the ipmression that it IS the standard deviation of the population - it isn't that is unknown, it is an unbiased estimator of it, and is often called by abuse of notation the pop. s.d., which is subtly different, as there is an implication there that we mean more.CORRECTION

The square of the satistic we are referring to as n-1 is the unbiased estimators of the pop. variance, it is not in general itself an unbiased estimator of the pop. s.d., see eg the wolfram entry. My memory is getting terrible these days. At least I think it might be, I don't recall clearly.

cookiemonster · Mar 8, 2004

So, if I'm reading Mathworld correctly and remembering the class correctly, if we repeatedly sampled a population and repeatedly calculated [itex]s_{N-1}^2[/itex] of these samples, the average of these values would yield the true variance?

cookiemonster

matt grime · Mar 8, 2004

Not exactly. The only way to *know* the *true* population variance is to sample every member of the population. The more samples you take from a population the better the estimate will be of this.

Confused about statistics terms

1. What are some common statistical terms and what do they mean?

2. What is the difference between descriptive and inferential statistics?

3. How do I determine which statistical test to use?

4. What is the p-value and why is it important?

5. How can I avoid common statistical mistakes?

Similar threads

Hot Threads

Recent Insights