Welcome to our community

Be a part of something great, join today!

calculating probablity that random subset of population contains duplicates

mads

New member
Mar 29, 2012
1
Hi,

Apologies that this is basic question but I have to start somewhere! (-:

The problem is succinctly stated in the msg title but, in greater detail; I'm working with some biological data from which samples have been taken. The sampling should have been at random. The samples include duplicates. What I need to know is how to calculate the expected number of duplicates in a sample size drawn from a population size.

For example, if I have a population size, p, of 3 million, and take 3 million samples, s, then the extent of duplicates within the samples s would be expected to be greater than if I take 300thousand samples.

But how do I calculate the expected rate given various values of p and s?
I have access to R & should be able to find my way to any libraries which might be helpful in answering this. Thanks

m
 

Mr Fantastic

Member
Jan 26, 2012
66
Hi,

Apologies that this is basic question but I have to start somewhere! (-:

The problem is succinctly stated in the msg title but, in greater detail; I'm working with some biological data from which samples have been taken. The sampling should have been at random. The samples include duplicates. What I need to know is how to calculate the expected number of duplicates in a sample size drawn from a population size.

For example, if I have a population size, p, of 3 million, and take 3 million samples, s, then the extent of duplicates within the samples s would be expected to be greater than if I take 300thousand samples.

But how do I calculate the expected rate given various values of p and s?
I have access to R & should be able to find my way to any libraries which might be helpful in answering this. Thanks

m
If I understand the problem correctly, then I think you should take a look at the hypergeometric distribution (use your prefered search engine).
 

awkward

Member
Feb 18, 2012
36
Hi Mads,

What do you mean by a "duplicate"? Do you mean its like you caught a fish, threw if back into the lake, and then caught the same fish again? Or is it like catching another fish of the same species? And to pursue the fishing analogy further, do you return the fish to the lake ("sampling with replacement"), or do you keep it ("sampling without replacement")?