Probably (yea I know, hilarious) easy qs about Bin and Normal distributions

In summary, the conversation discusses the establishment of the probability difference between two probabilities, p_1 and p_2, at a 95% confidence level. The probabilities refer to the likelihood that a cabin hook will hold a certain force of 25kN. Two samples, with sizes n_1=107 and n_2=92, are used to estimate the probabilities. The conversation also mentions the use of the Binomial distribution and the calculation of standard deviation for stochastic variables. There is also a mention of using a 97.5% probability instead of the originally stated 95%. The use of Python source code is suggested to test ideas numerically.
  • #1
mathpariah
4
0
hey

Gonna get straight to the point. I need to establish the probability difference between two probabilities p_1 and p_2 at 95%. Its the two probabilites that a cabin hook will hold for a certain force (25kN).

two samples, each with sizes "the originals" n_1=107, "cheap pirated ones" n_2=92. y_i are the amount of hooks which managed to successfully keep their acts together at 25kN for resp. sample

y_1=84 of which Y_1 is Bin(107,p_1) ≈ N(107*p_1, sqrt(107*p_1*q_1)) where q_1=1-p_1

and

y_2=12 of which Y_2 is Bin(92, p_2) ≈ N(92*p_2, sqrt(92*p_2*q_2))

now p_1 and p_2 can be estimated as follows:

^p_1 = y_1/107 which is an observation from ^P_1 ≈ N(p_1, sqrt((p_1*q_1)/107)

and ^p_2=y_2/92 obs ^P_2 ≈ N(p_2, sqrt((p_2*q_2)/92)

which gives us the estimated probability difference of:

^P_1-^P_2 ≈N(p_1-p_2, sqrt((p_1*q_1/107)+(p_2*q_2/92)))

which means you get the variable:

(^P_1-^P_2-(p_1-p_2))/sqrt((^P_1*^Q_1/107)+(^P_2*^Q_2/92)) ≈ N(0,1)

closing this stuff with z=z_0.975=1.96 from the normal dist. table gives me

INTERVAL_(p_1-p_2) = (^p_1-^p_2 -+ 1.96*sqrt((^p_1*^q_1/107)+(^p_2*^q_2/92)))=(0.55, 0.76)

This solution is supposedly correct and all I need is someone to help me understand the following:


1. When you subtract two stocastic variablesyou subtract the expected values from each other which I get, but the standard deviation is different... it becomes the sqrt of the sum of the independent stocastic variables standard deviations?

2. Why is the standard dev of the stocastic variable ^P_1 sqrt((p_1*q_1)/n_1)? can't you just call it σ_1? Are they the same? And is that always the case? Probability of something happening times 1- that probability through the sample size is equal to σ^2?

3. Why do you use the 97.5% probability when the question originally stated 95%? I know you use 1-α/2 to get there but I've never understood WHEN you can just go with 95% and when you have to use 97.5, for F distributions it seems going with 95% is ok even with 2 samples

4. Why do you use a Binomial distribution for this kind of problem?


would be pretty much amazing if anyone could help out with any or all of these questions, I am a donkey when it comes to math stat. thanks
 
Physics news on Phys.org
  • #2
I am working on the same kinds of questions in my sigfig thread. I'm not getting help from people more familiar with the information either.

As to question 1.

Take a random variable with a mean of 36 and a standard deviation of 1: eg: 36(1)
If two data points are selected from the same variable, we might get:
a=36+1, b=36+2

By definition "a" is a 1 sigma deviation, and b is a 1.5 sigma deviation.
p(>=a) is ~= 15.87
p(>=b) is ~= 6.68

The resulting deviation is going to be: 2.5 and the resulting mean would be: 72
Hence there is a probability of at least: 15.87 * 6.78 of getting a deviation of 2.5 or MORE in the result. eg: a probability of around ~1%
But the chances of us *sampling* from the original variable a data point which is 2.5 or MORE deviaitons away is only 0.62% (much less likely.)

Hence, the result is *more* likely to have deviations of 2.5 sigmas away from the mean than either of the original data points.

The actual probability or the sum will be higher than I have listed; for data points (for example) having a deviation of 1.25 are much more likely than getting one of 1.5; and a deviaton of 1.25 + 1.25 is still 2.5; so that means there are many possibilities of getting the resulting SUM that I have excluded arbitrarily.

I like to think of stochastic/random variables as having a constant mean, and a random variation. The square root in the addition, I think, has to do with the idea of a "random walk" and is not a perfect estimator of the new deviation in all cases of error propagation.
eg: in multiplication instead of addition, there is a definite problem with the typical formulas for error propagation and which I am exploring now in the sigfig calculator thread.

I am needing to do a little work on that, so I won't attempt to answer questions 2+ of your thread at this time; I need to refresh my memory on these points anyhow -- and I am not getting any more help in my sigfig thread than you are here... (at least, yet.)

There is Python source code in that thread that may be helpful in setting up some quick "what if" experiments for yourself to test your ideas our numerically. If you need some help getting started with Python, or getting Python (it's free), don't let that deter you from trying it out -- I and many others can surely help. You can delete the parts of the python program which you don't need, or just don't use them.
:smile:
 

Related to Probably (yea I know, hilarious) easy qs about Bin and Normal distributions

1. What is the difference between a binomial and normal distribution?

A binomial distribution is a discrete probability distribution that describes the outcomes of a fixed number of independent trials with two possible outcomes (success or failure) and a constant probability of success. A normal distribution is a continuous probability distribution that describes a symmetrical bell-shaped curve with most data points falling near the mean and fewer points at the tails.

2. How do you calculate the mean and standard deviation for a binomial distribution?

The mean of a binomial distribution is equal to the number of trials multiplied by the probability of success. The standard deviation is equal to the square root of the number of trials multiplied by the probability of success multiplied by the probability of failure.

3. Can the mean and standard deviation of a binomial distribution be used to approximate a normal distribution?

Yes, the central limit theorem states that as the number of trials in a binomial distribution increases, the distribution will become more and more similar to a normal distribution.

4. What are some real-world examples of binomial distributions?

Some examples include coin flips, rolling a die, and success rates in sports or business endeavors.

5. How can the normal distribution be applied in scientific research?

The normal distribution is commonly used in statistical analysis to describe the distribution of a continuous variable, such as height or weight, in a population. It is also used in hypothesis testing and confidence interval calculations.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Calculus and Beyond Homework Help
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Precalculus Mathematics Homework Help
Replies
8
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
4K
Replies
9
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Math Proof Training and Practice
4
Replies
105
Views
12K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
Back
Top