Hypergeometric Probability Testing: Simple Question

In summary: So in summary, Neil is trying to calculate the probability of getting a number of positive results, given selection of a sample from a greater set, but is getting confused about some basic concepts. He is using the hypergeometric probability distribution, but is not correctly calculating the probabilities based on the total number of entries in the table.
  • #1
neil.thompson
10
0
Hi everyone.



So I'm afraid I don't really know much about statistics, but I am trying to learn by working through a book, and taking some examples (I have mathematics experience, but from a biological perspective).

Just now, I am looking at the hypergeometric probability distribution. I have access to MATLAB so I have been playing around with examples in that. As I understand it, the hypergeometric probability distribution gives the probability of a number of positive results, given selection of a sample from a greater set (where the total successes are known). That seems simple enough.

However, I also expect that, as with everything else, the total probability sums to 1. So I am trying examples in MATLAB (http://www.mathworks.co.uk/help/toolbox/stats/hygecdf.html - there is an example on how to use it there too) and obviously doing something wrong? Imagine a total set of 61 balls. 30 of them are red and 31 of them are black. I take a sample of 34, without replacement, finding that 14 are red, and 20 are black.

I think this is OK - there seems to be no requirement on equal divisions between the colours or anything like that. So I run, for the cumulative probability:

Red=hygecdf(14,60,30,34);
Black=hygecdf(20,61,30,34);

I get Red = 0.1260, and Black = 0.9520. I think that these two calculations should be equivalent - they both have the same sample size (34) and that they should sum to 1, but obviously they do not - I am doing something very basic wrong. !


Sorry for all the words!

thank you,

Neil.
 
Physics news on Phys.org
  • #2
neil.thompson said:
Hi everyone.
So I'm afraid I don't really know much about statistics, but I am trying to learn by working through a book, and taking some examples (I have mathematics experience, but from a biological perspective).

Just now, I am looking at the hypergeometric probability distribution. I have access to MATLAB so I have been playing around with examples in that. As I understand it, the hypergeometric probability distribution gives the probability of a number of positive results, given selection of a sample from a greater set (where the total successes are known).

I get Red = 0.1260, and Black = 0.9520. I think that these two calculations should be equivalent - they both have the same sample size (34) and that they should sum to 1, but obviously they do not - I am doing something very basic wrong. !

I don't quite follow your set up, but I can give you an example of the hypergeometric distribution in terms of a 2 x 2 table with fixed marginal totals:

a b

c d

where a+b, c+d, a+c and b+d are all fixed. Obviously the sum of all entries is also fixed. This is a two way contingency table where the variables in cells a, b, c, d follow a hypergeometric distribution when subject to the marginal constraints. Can you put your problem into this form? If you calculate probabilities based on individual column or row totals such as P(a)= a/(a+b), it is probabilities P(a) and P(b) that must sum to one based on the marginal total a+b. You need to check if you are using the appropriate denominators.

http://data.princeton.edu/wws509/notes/c5s1.html
 
Last edited:
  • #3
neil.thompson said:
Red = hygecdf(14,60,30,34)

This is the probability that you will draw no more than 14 red balls when drawing 34 balls from a bag containing 60 balls, 30 of which are red.

neil.thompson said:
Black = hygecdf(20,61,30,34)

This is the probability that you will draw no more than 20 black balls when drawing 34 balls from a bag containing 61 balls, 30 of which are black.

Neither of these Matlab statements describe your result, the appropriate statements are:

Red = hygecdf(14,61,30,34);
Black = hygecdf(20,61,31,34);

... from the numerical results you give I can see that these were in fact the statements you used (that killed a bit of time!)

The reason they do not sum to 1 is that these partial results are not mutually exclusive - of course we know that, because they both happened in the same trial!

If you want two probabilities that sum to 1, you want Black to be 1 - Red, in other words Black must be the complement of Red. As Red is the probability that no more than 14 of the balls will be red, Black needs to be the probability that more than 14 of the balls will be red - which would mean that no more than 19 can be black.

Lo and behold, hygecdf(19,61,31,34) = 0.8740 which is 1 - 0.1260.
 

Related to Hypergeometric Probability Testing: Simple Question

1. What is hypergeometric probability testing?

Hypergeometric probability testing is a statistical method used to analyze categorical data and determine the likelihood of obtaining a particular set of observations by chance alone.

2. How does hypergeometric probability testing differ from other statistical tests?

Hypergeometric probability testing is specifically designed for analyzing categorical data, while other tests may be better suited for continuous or interval data. Additionally, hypergeometric probability testing takes into account the sample size and population size, whereas other tests may not.

3. When should hypergeometric probability testing be used?

Hypergeometric probability testing should be used when you have a small sample size and a specific population size, and you want to determine the probability of obtaining a certain outcome.

4. What are the assumptions of hypergeometric probability testing?

The assumptions of hypergeometric probability testing include: the sample is randomly selected from the population, the population size is known, and the observations are independent from each other.

5. How do you interpret the results of a hypergeometric probability test?

The results of a hypergeometric probability test will give you a p-value, which represents the probability of obtaining the observed data by chance alone. If the p-value is less than the chosen significance level, it can be concluded that there is a significant relationship between the variables being tested.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
545
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
53
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
3K
Replies
1
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Calculus and Beyond Homework Help
Replies
3
Views
1K
Back
Top