Unbiasedness of estimates

  • I
  • Thread starter fog37
  • Start date
In summary: We have to develop it in a way that is still logically consistent, but it is challenging. In particular, the definitions of things like "bias" and "unbiased estimator" are not entirely satisfactory. It is better to simply state that an estimator is biased or not, or that it has a positive or negative bias. But, as you say, it is common to say that an estimator is unbiased if its expected value is equal to the parameter it estimates. So we accept that as a concept. It doesn't lead to a contradiction or anything.In summary, the conversation discusses the concept of unbiased estimators and their importance in statistical analysis. The goal is to have estimators that are as accurate
  • #1
fog37
1,568
108
TL;DR Summary
Unbiasdness of estimates
Hello (again).

I have a basic question about standard error and unbiased estimators.

Let's say we have a population and with a certain mean height and a corresponding variance. We can never know these two parameters, the mean and the variance, and we can only estimate them. Certainly, the more accurate the estimates, the better.

To achieve that, we want out estimators to be unbiased so the expectation value of the mean "tends" to the actual population mean. This unbiasdness is rooted in the theoretical idea that if we took many many many many samples and calculated their mean and created the sample distribution of the means, the mean of the means would be the actual population mean. However, the sample means are all different from each other, some are close to the true mean some are very off...The standard error of the sample mean is essentially the variance of that normal sampling distribution telling us how much the various sample means differ from each other.

That said, assuming it is correct, we only work with a single sample of size ##n## and have a single sample mean whose value could still be very very far from the actual mean, i.e. we could be very off! Isn't that a problem? The idea that ##E[sample mean]=true mean## seems very abstract. I know that, in statistics, we always have not choice but to deal with uncertainty...I guess knowing that ##E[sample mean]=true mean## gives us a little more confidence that our sample statistics is a decent result? It is a better type of uncertainty than other uncertainties...

The same idea applies to ##95% ##confidence intervals: if we took a million samples and calculated their ##CI##, the true population means would be captured inside 95% of those sample CI intervals. That is an interesting result but, working with a single sample as we always do, it may be well possible that our constructed ##CI## does not contain the true population parameter!

Thank you!
 
Physics news on Phys.org
  • #2
Yes, what you say is correct.

However, it is not the only way to look at these issues. Instead of considering that there is a "true mean" at all you can instead consider the population mean itself to be a random variable. That is Bayesian probability.
 
  • Like
Likes fog37
  • #3
Dale said:
Yes, what you say is correct.

However, it is not the only way to look at these issues. Instead of considering that there is a "true mean" at all you can instead consider the population mean itself to be a random variable. That is Bayesian probability.
I see. I am not there yet to get into Bayesian probability :)

Another thing I am wondering:
in statistics, lots of sophisticated statistical tests are run to check for key assumptions being verified or not. For example, when we create a model using OLS, the estimates can be not trustworthy if the assumptions about the residuals are not met...

In machine learning, working with lots of data, the main concern are the performance metrics assessing how good the model is....It is all about prediction and less about inference, correct? Does that mean that all those statistical test are less of a concern when the amount of data is large and the goal is prediction?

thanks!
 
  • #4
fog37 said:
The idea that ##E[sample mean]=true mean## seems very abstract.
To me, it doesn't seem abstract to say that the formula we use has the same expected value as the correct answer.
fog37 said:
working with a single sample as we always do, it may be well possible that our constructed ##CI## does not contain the true population parameter!
True, but you have 95% confidence that the ##CI_{95}## contains the true value. And if you want more confidence, you can work with other common confidence levels like 97.5% or 99%, which just expands the intervals to cover more.

In estimating the variance, ##\sigma^2##, of a random variable, you might consider ##\sum {(X_i-\mu)^2}/n##, ##\sum {(X_i-\bar X)^2}/n##, or ##\sum {(X_i-\bar X)^2}/(n-1)##.
1) ##\sum {(X_i-\mu)^2}/n## is unbiased, but it requires knowledge of the true population mean, ##\mu##.

2) ##\sum {(X_i-\bar X)^2}/n## uses the sample mean to estimate ##\mu##, but it is biased. It gives an estimate of the variance that tends to be low. The way to see this is to realize that each ##(X_i - \bar X)^2## is a little small because the value of ##X_i## has drawn ##\bar X## toward it, which wouldn't happen with ##(X_i-\mu)^2##.

3) ##\sum {(X_i-\bar X)^2}/(n-1)## does not need the knowledge of ##\mu## and it exactly corrects for the problem of ##\sum {(X_i-\bar X)^2}/n## tending to be too small. It is unbiased.
 
Last edited:
  • #5
fog37 said:
lots of sophisticated statistical tests are run to check for key assumptions being verified or not.
I would say that lots of sophisticated statistical tests should be run to check. Unfortunately, a lot of time people just plug in their data and look for a p-value.
 
  • Like
Likes fog37
  • #6
fog37 said:
The same idea applies to ##95% ##confidence intervals: if we took a million samples and calculated their ##CI##, the true population means would be captured inside 95% of those sample CI intervals. That is an interesting result but, working with a single sample as we always do, it may be well possible that our constructed ##CI## does not contain the true population parameter!

Thank you!
You may choose any confidence level you like. 99.9999% is quite possible.
 
  • Like
Likes fog37
  • #7
Or you can just construct the confidence interval ##(-\infty,\infty)## and get 100% confidence.
 
  • Like
Likes fog37 and FactChecker
  • #8
fog37 said:
That said, assuming it is correct, we only work with a single sample of size ##n## and have a single sample mean whose value could still be very very far from the actual mean, i.e. we could be very off! Isn't that a problem? The idea that ##E[sample mean]=true mean## seems very abstract. I know that, in statistics, we always have not choice but to deal with uncertainty...I guess knowing that ##E[sample mean]=true mean## gives us a little more confidence that our sample statistics is a decent result?
Yes, there is a fundamental problem in conceptualizing statistics and probability theory. We seek to develop mathematics in careful way so that its theorems are true - i.e. certainly true. So what happens when the subject matter of the mathematics is supposed to represent something that is uncertain?

The bottom line is that is that probability theory (when interpreted as uncertainty about whether something happens) only tells you things abut probabilities. It has no theorems that tell you that something is certain to happen. (The closest you get to that type of result are theorems about the limits of probabilities approaching 1.) The theorems of probability theory have the form: if the probability of something is such-and-such then the probability of this other thing is so-and-so. You can see this pattern in sampling statistics. A sample statistic of a random variable is itself a random variable, so the sample statistic has a distribution. The distribution of sample statistic has its own sample statistics and they have their own distributions - etc.

So applying probability theory depends on the science of whatever subject you are applying it too. There are no theorems in probability theory that gurantee the correctness of using it in a particular way in all possible practical situations. Things like the familiar numbers used for "statistical significance" are not consequences of mathematical theorems. They've come into use because they've been empirically useful in many situations.
 
  • Like
Likes fog37, jim mcnamara and FactChecker
  • #9
Stephen Tashi said:
So applying probability theory depends on the science of whatever subject you are applying it too. There are no theorems in probability theory that gurantee the correctness of using it in a particular way in all possible practical situations. Things like the familiar numbers used for "statistical significance" are not consequences of mathematical theorems. They've come into use because they've been empirically useful in many situations.
I would like to point out that it is true of all applied mathematics. Physics was all applied mathematics until string theory emerged.

The challenge of General Relativity wasn't the math, it lay in convincing that the mathematics corresponded to the real world.
 
  • Like
Likes fog37 and Stephen Tashi
  • #10
fog37 said:
TL;DR Summary: Unbiasdness of estimates

Hello (again).

I have a basic question about standard error and unbiased estimators.

Let's say we have a population and with a certain mean height and a corresponding variance. We can never know these two parameters, the mean and the variance, and we can only estimate them. Certainly, the more accurate the estimates, the better.

To achieve that, we want out estimators to be unbiased so the expectation value of the mean "tends" to the actual population mean. This unbiasdness is rooted in the theoretical idea that if we took many many many many samples and calculated their mean and created the sample distribution of the means, the mean of the means would be the actual population mean. However, the sample means are all different from each other, some are close to the true mean some are very off...The standard error of the sample mean is essentially the variance of that normal sampling distribution telling us how much the various sample means differ from each other.

That said, assuming it is correct, we only work with a single sample of size ##n## and have a single sample mean whose value could still be very very far from the actual mean, i.e. we could be very off! Isn't that a problem? The idea that ##E[sample mean]=true mean## seems very abstract. I know that, in statistics, we always have not choice but to deal with uncertainty...I guess knowing that ##E[sample mean]=true mean## gives us a little more confidence that our sample statistics is a decent result? It is a better type of uncertainty than other uncertainties...

The same idea applies to ##95% ##confidence intervals: if we took a million samples and calculated their ##CI##, the true population means would be captured inside 95% of those sample CI intervals. That is an interesting result but, working with a single sample as we always do, it may be well possible that our constructed ##CI## does not contain the true population parameter!

Thank you!
A slightly different comment here. From your post, this

"we want out estimators to be unbiased so the expectation value of the mean "tends" to the actual population mean."
confuses two things. When we talk about an estimator being unbiased we mean that when the expected value of the estimator is taken that expected value is the parameter you're estimating, and this is true regardless of the sample size: it is a property of the "structure" of the estimator, not any particular sample size.
Saying "the expectation value of the mean "tends" to the actual population mean" isn't the same thing. This is "asymptotically unbiased": here the estimator may not be unbiased but the sequence of expectations of the estimator converges (as n tends to infinity) to the target parameter. This is an asymptotic property, a limiting property.
I'll also comment that, occasionally, we're willing to accept using an estimator with a small amount of bias if its standard error is significantly lower than the standard error of an unbiased estimator.
 

What is unbiasedness of estimates?

Unbiasedness of estimates refers to the property of an estimator to produce estimates that are, on average, equal to the true value of the parameter being estimated. In other words, an unbiased estimator does not systematically overestimate or underestimate the true value.

Why is unbiasedness important in scientific research?

Unbiasedness is important because it ensures that the estimates we obtain from our data are as accurate as possible. If our estimates are biased, they may lead to incorrect conclusions and affect the validity of our research findings.

How can we determine if an estimator is unbiased?

To determine if an estimator is unbiased, we can compare its expected value to the true value of the parameter being estimated. If the expected value is equal to the true value, then the estimator is unbiased. We can also use simulations or theoretical proofs to assess the unbiasedness of an estimator.

Can an unbiased estimator still have a large variance?

Yes, an estimator can be unbiased and still have a large variance. Unbiasedness and variance are two separate properties of an estimator. An estimator can be unbiased but have a high variability, which means that its estimates may vary widely from the true value.

How can we improve the unbiasedness of our estimates?

One way to improve the unbiasedness of our estimates is to use a larger sample size. As the sample size increases, the estimates tend to become less biased. Additionally, using more sophisticated statistical methods and considering potential sources of bias can also help improve the unbiasedness of our estimates.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
478
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
447
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
928
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
662
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
490
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
Back
Top