Estimating uncertainty for a probability

In summary: I think the Wilson score interval, z=1,96, would be the most appropriate choice for the binomial distribution. This value is close to the 1.96 that would be expected if the binomial distribution were a good approximation to the Gaussian distribution.
  • #1
M_1
31
1
If I flip an old coin with irregularities 100 times and get side a up 55 times and side b 45 times I say that the probability is 55% that side a comes up. But how can I estimate the uncertainties in this experiment? For example can I say something like "with 95% confidence the probability is within" 55% +/-x%?

Thanks!
 
Physics news on Phys.org
  • #2
If you repeat the experiment a lot of times, you get a binomial (Bernoulli) distribution that has a certain width.
Basically you are testing the hypothesis that p = 0.5 and find a .05 deviation.
(Variance ##np(1-p) = 25##, sigma = 5 -- an estimate )
So such a deviation is one sigma from the expected mean and as such not sensational.

You can claim that with 95% confidence you found ##(55 \pm 10)## % probability .

(some rounding off is necessary; the relative error in ##\sigma## itself is of the order of ##1/\sqrt {100}##, so 10% )
 
  • #3
Hi M_1:

The equations you want can be found at the following link:
There are two aspects of an error range.
(1) Show the +/- value. This is generally the calculated standard deviation or a multiple of the standard deviation. The value of the multiple most commonly used is 1 or 2.
(2) There are two ways describe the confidence interval regarding the +/-value.
(a) Say the number of standard deviations used.
(b) This is a more preferred method. Say the percentage that is associated with the number of standard deviations. For the Gaussian distribution, the common multiple values 1 or 2 have the following corresponding confidence values: 68% and 95%. The following describes a method to calculate a confidence interval.
You may want to also read the following to get a good understanding of what a confidence interval means since here are common misunderstandings about this.
The confidence interval generally assumes a Gaussian distribution. For the coin flip example, you can use the actual binomial distribution. If the sample is large enough, the Gaussian distribution would be a good approximation.​

Hope this is helpful.

Regards,
Buzz​
 
  • #4
Many thanks BvU and Buzz Bloom for your help. Thanks to your advices I have now progressed considerably (I think/hope). I now understand that it is more interesting to say that side a comes up 99 times and side b 1 time. The estimation of p is then 99/100=99%. Using the “Normal approximation interval” on https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval we obtain 99+/-2% with 95% confidence (that is, using z=1,96). This is obvioulsy not very good since p for sure is not 99+2=101%.
Using the formula for the Wilson score interval on the same web page and again with z=1,96 we obtain the interval for p between
94.6%-99.8% (1), or
97.2+/-2.6% (2), or
99 +0.8% or – 4.4% (3)
with 95% confidence.
This looks intuitively very good since the interval doesn’t reach above 100%. I still have two questions:
Q1: Is (1),(2), or (3) the most correct way of presenting the interval? I think it should be (3) since the seemingly most educated guess would be p=99%. Furthermore I don’t like (1) since I want to plot in a bar charts and then it’s nice to have a centerpoint (99%) with error bars (+0.8% and -4.4%). But what do you think?
Q2: Is the approach correct? I can understand using z=1.96 for the normal approximation but for the binomial distribution I don’t understand what z is and I definitely have the impression that 1.96 comes from the number of standard deviatons for 95% confidence of the normal distribution – so basically I put z=1.96 in the Wilson score interval without having a clue of what I’m actually doing. But the result looks good! So do you think the approach is correct?
Thanks again, a fantastic forum!
 
  • #5
Hi again! Any comments to my questions Q1 and Q2 would be most welcome. (I realized that it was not obvious that I had two questions in my last posting.)
Thanks you Physics Forum!
 
  • #6
Hi M_1:

At my age, my math skills are not as good as when I was younger, so please take that into account that when considering my comment.

I think there is a problem with using the value z = 1.96.
That value is based on a Gaussian assumption. When you try to evaluate the the binomial distribution at an extreme mean, 99 h 1 t, rather than a more central 55 h 45 t, I believe the Gaussian assumption will give an large error in the result.​
I am unsure what concept of "confidence interval" you want to use. My guess is you want this one:
The explanation of a confidence interval can amount to something like: "The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the 10% level". In fact, this relates to one particular way in which a confidence interval may be constructed. (From https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation .)​
I also guess that the Wilson score method is not the best choice if accuracy is the most important criterion. I suggest you may want to use the Clopper-Pearson interval instead, because
The Clopper-Pearson interval is an early and very common method for calculating binomial confidence intervals. This is often called an 'exact' method, but that is because it is based on the cumulative probabilities of the binomial distribution...
(From https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper-Pearson interval .)​

I hope this is helpful.

Regards,
Buzz
 
  • #7
Hi M_1:

I have been thinking about your question some more, and I came up with an interpretation that feels right to me. It it based on the following
"Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time"
from
With a Gaussian distribution, which is symmetrical about the mean, it is reasonable to find a +/- x, where x is a multiple of the standard deviation, such that the probability that a repeat of the experiment will result in a new calculated mean m that will be inside the range m0 +/- x (where m0 is the old mean) a specified percentage of the time. That is, for example, if x = 2 x standard deviation:
(1) Probability of m ∈ {m0 - x, m0 + x} > 95%.​

So, we want to make a similar statement for the experiment that resulted in 99 h and 1 t. However, the result for this experiment is at the tail of the binomial distribution, which is far from Gaussian, and even from being symmetrical. So instead of (1) we want something like
(2) Probability of m ∈ {Min, 100} > 95%.​
To find the value of Min, calculate the binomial probability distribution distribution for the range of integers 100, 99, 98, . . ., Min so that
(3) P100 + P99 + ... + PMin ≥ 95%.​
To calculate these values:
(4) P100 = 0.99100
(5) Pk = Pk+1 (1/0.99) (k+1) / (100-k)​
My guess is that Min will be about 94.

One more observation. This calculation is not exactly what the quoted interpretation above says. However, I think it will be a good approximation of what it says.

Regards,
Buzz
 
Last edited:

Related to Estimating uncertainty for a probability

1. What is "estimating uncertainty" for a probability?

Estimating uncertainty for a probability is the process of determining the potential range or degree of error in a given probability value. It involves analyzing the available data and using statistical methods to estimate the level of uncertainty in the probability calculation.

2. Why is it important to estimate uncertainty for a probability?

Estimating uncertainty for a probability is important because it allows us to understand the reliability and accuracy of the probability value. It helps us make informed decisions and assess the potential risks associated with a particular event or outcome.

3. How is uncertainty for a probability calculated?

Uncertainty for a probability is typically calculated using statistical methods such as confidence intervals or standard deviation. These methods take into account the sample size, variability of the data, and the level of confidence desired.

4. What factors can affect the uncertainty of a probability?

The uncertainty of a probability can be affected by various factors such as the amount and quality of data available, the assumptions made in the calculation, and the accuracy of the measurement or observation used to determine the probability. Additionally, external factors such as human error or environmental conditions can also contribute to uncertainty.

5. How can uncertainty in a probability be reduced?

Uncertainty in a probability can be reduced by increasing the amount and quality of data used in the calculation, improving the accuracy of measurements or observations, and minimizing the potential for human error. Additionally, using more sophisticated statistical methods and considering multiple sources of information can also help reduce uncertainty.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
703
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
856
Back
Top