Binomial Distribution Question

In summary, Jonas discovered that the probability of having no more than 6 occurrences of a selection (A-E) on a Chemistry exam is .546042, but the probability of all selections having at least 4 occurrences is .048543. This means that the likelihood of there being no more than 6 occurrences of a selection on an exam is less than the likelihood of all selections having at least 4 occurrences.
  • #1
JonasJSchreibe
5
0
Hi, I am new here, and my name is Jonas. I'm a CS major at a university in the Northeast US. I'm a senior and wrapping up degree requirements which include a science track. I chose Chemistry because Physics was full.

The chemistry exams are multiple choice (because you couldn't grade 300 exams in a timely fashion any other way), choices A-E, and have 25 questions.

It turns out that the answers to these exams have, it seems to me, an unlikely distribution of A's, B's, C's, D's, and E's.

I was wondering how I would find out the likelihood that an exam has no more than 6 of the same choices and no less than 4.

I was thinking it'd be easier to find out 1 - (probability of 7 or greater of the same selections + probability of 3 or less of the same answers). So, to do that I just do 1 - ((1/5 * summation from k=7 to k= 25 of (25 choose k)) + (1/5 * summation from k = 0 to k = 3 of (25 choose k)).

However, it seems that this would be the likelihood that there are no more than 6, no less than 4 of one specific choice (A-E), rather than all choices. Is this correct, or am I totally off track?

Here is what made me so curious. The reason there are four forms per exam is to prevent cheating. Both the person to the right of you and the left of you will be working from different exams, as will the person in front and behind of you will. Could someone point me in the right direction here?

Thanks!

http://img600.imageshack.us/img600/1001/47634578.png '

Here, I have an excel table with the probabilities I mentioned. The probability that 1 choice (A-E) will have 4, 5, or 6 occurrences is .546042. The probability that all would, should at best be .048543 (that's .54602 ^ 5). But this discounts the fact that there are only 25 questions, so it is impossible for all choices to have 6 occurrences. I am not sure how to rectify this, but maybe you guys will be.

http://img577.imageshack.us/img577/3518/binomialdistributions.png
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Hey JonasJSchreibe.

My recommendation is you use what is called a multinomial distribution and a Pearson Chi-Square Goodness of Fit (GOF) test.

You can estimate proportions by using the sample estimates of p_i = #Number of times ith possibility occurs / # of Total Counts.

Have you use a Chi-Square GOF before?
 
  • #3
I don't believe I have. However, I realized that the only 3 possibilities for between 4 and 6 occurrences for choices A-E with 25 trials are {5,5,5,5,5}, {5,5,5,4,6}, or {5,4,6,4,6}. Should this help me in any way?

I suppose I could do p(5,5,5,5,5) + all permutations of p(5,5,5,4,6) + all permutations of (5,4,6,4,6) all divided by 5^25. Would that yield the correct result?
 
  • #4
If you want to see whether your observed (i.e. sample) data is different from some expected distribution, a typical way to test this is to use the Chi-Square Goodness of Fit.

What you can do is to construct a variety of these tests: one for each distribution under your criteria and see if you can reject all of them (or a significant amount).

This will basically give you a way to statistically gauge the answer to your question.
 
  • #5
Looking at the Wikipedia page for Goodness of Fit scares me. I seem to remember a least-squares regression analysis which was used to determine causality vs correlation that looked a bit like this. I balked at it when I saw the Wikipedia page. Hopefully there is a better resource online to help me understand this or just run the calculations with marginal participation on my part. It's just something I had an interest in, not for work or school or anything, just curiosity.

EDIT: I have done multinomial distribution in a probability in computing class, and I recognize the probability mass function as something that could help me. Is there any reason to utilize more advanced methods of solving this problem?
 
  • #6
The GOF is a lot simpler than it looks.

You basically calculate the frequency cells for the expected distribution and then use the formula to get the X^2 statistic. (In other words add the (Oi-Ei)^2/Ei terms to get X^2).

Then you compare this test statistic to a Chi-Square distribution with the right degrees of freedom (for multinomial with n choices the DF is n -1).

If P(Chi-Square > X^2) < alpha (usually 0.05) then you reject the hypothesis that the two distributions are considered to be statistically similar.

That's all there is to it.
 
  • #7
Well, my data determines that the upper bound is .048543, I think that's correct. On two consecutive exams I can say that the upper bound is .048543^2 ~ .0025 or odds of 400:1 i.e. very unlikely. I just wanted to determine whether this freak occurrence is just that, or by design. If it is the latter, I can use this information when I'm stuck on an answer or two in the final exam to gain an edge.
 
  • #8
Can you show us what hypotheses you are testing along with your test statistic?
 
  • #9
There is a 54.6% chance of having 4, 5, or 6 occurrences of a specific choice (A-E). That, to the fifth power (because there are 5 choices) is 4.8%. Having this occurrence (all 5 choices have between 4 and 6 occurrences) on two consecutive exams is less than .25% (4.8%^2).

This statistic, however, is an overestimate. It counts the possibility of having 6 occurrences for each choice. That is not possible because there are only 25 questions. It is not possible to have more than 2 occurrences of either 4 or 6. The possibilities are {5, 5, 5, 5, 5}, all permutations of {5, 5, 5, 4, 6} and all permutations of {5, 4, 4, 6, 6}.

So these probabilities, 4.8% and .25% are really upper bounds. You can be sure it'll be less than that.
 

Related to Binomial Distribution Question

1. What is a binomial distribution?

A binomial distribution is a probability distribution that describes the likelihood of obtaining a certain number of successes in a fixed number of independent trials, where each trial has two possible outcomes (success or failure) and the probability of success is constant for all trials.

2. What are the key characteristics of a binomial distribution?

The key characteristics of a binomial distribution are: a fixed number of trials, two possible outcomes for each trial, a constant probability of success for each trial, and independence between trials.

3. How is a binomial distribution different from a normal distribution?

A binomial distribution is different from a normal distribution in that it is discrete rather than continuous, and it is used to model a set number of trials with only two possible outcomes, whereas a normal distribution is used to model continuous data with a range of possible outcomes.

4. How do you calculate the mean and standard deviation of a binomial distribution?

The mean of a binomial distribution is calculated by multiplying the number of trials by the probability of success for each trial. The standard deviation is calculated using the formula sqrt(n * p * q), where n is the number of trials, p is the probability of success, and q is the probability of failure.

5. What types of problems can be solved using the binomial distribution?

The binomial distribution can be used to solve problems related to the likelihood of obtaining a specific number of successes in a set number of trials, such as the probability of flipping a coin and getting heads 5 times out of 10 flips.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top