Correlation limits for binary variates

Mentz114 · Feb 22, 2016

I've been looking at detector coincidences and tried to find what general limits apply to coincidences. I was surprised how simply the calculation works out. My question is whether it is correct and where can I find similar stuff ?

Consider two binary sequences produced by random processes where the probabilities of getting 1 are ##p_1## and ##p_2## respectively. Now assume that the number of 1's in the streams is ##1_n \rightarrow Np_n## as ##N \rightarrow \infty##.

If we know the counts ##1_n,\ 0_n=N-1_n## in our sequences then by a permutation argument it is clear that the maximum number of (0,0) coincidences can not be greater than the minimum of ##0_1=N(1-p_1)## and ##0_2=N(1-p_2)##. Similarly the maximum possible (1,1) coincidences is the least of ##Np_1## and ##Np_2##. Assumimg ##p_1<p_2## this gives a total for the (0,0) and (1,1) coincidences of ##S_{12}=N(1-p_2+p_1)##. There are no permutations which give a greater total than this.

The maximum possible correlation between the streams is given by ##\mathcal{C}_{12}=(2S_{12}-N)/N## which gives ##1-2(p_2-p_1)##.

From this one can write for the maximum possible correlations between 4 streams ( assuming ##p_1\leq p_2 \leq p_3 \leq p_4##).

##|\mathcal{C}_{12}+\mathcal{C}_{23}+\mathcal{C}_{34}-\mathcal{C}_{41}| \leq 2##

the ##p_n## terms conveniently cancelling.

chiro · Feb 22, 2016

Hey Mentz114.

I think it would be useful to define a test statistic that can be decided on regarding whether co-incidences exist and then to use that to evaluate the hypothesis of co-incidences.

If you can do that then you will have a far better chance of understanding and estimating this attribute in your random sample.

Strictly speaking the first thing to do would probably involve assessing the sample for independence and independence means that any conditional probability of any sort equals the probability of the original random variable (not that being conditioned on).

There are statistical tests to do this - and I think one involves chi-square.

http://www.stat.wmich.edu/s216/book/node112.html

Basically if correlation exists it can exist in many forms but the independence test is the first thing to ascertain evidence of whether hidden correlations may exist.

The other way is to partition the random variables and decompose them based on their correlation - something that happens in a Principal Component Analysis (or PCA). If information is independent then the decomposition should yield what was initially there to start off with and you won't be able to reduce the dimension of the system without significantly impacting its ability to capture variation.

Mentz114 · Feb 23, 2016

chiro said:

Hey Mentz114.

I think it would be useful to define a test statistic that can be decided on regarding whether co-incidences exist and then to use that to evaluate the hypothesis of co-incidences.
[..]
.

Chiro,

thanks for the reply. I think you might be misunderstanding what I'm doing. The theoretical limits on correlations is not ( it seems ) a very interesting subject but
it crops up, see here for instance Bell notes and wiki CHSH.
I could be in the wrong sub-forum ...

FactChecker · Feb 23, 2016

Mentz114 said:

the maximum number of (0,0) coincidences can not be greater than the minimum of ##0_1=N(1-p_1)## and ##0_2=N(1-p_2)##.

Maybe I am misunderstanding you, but I would say this is wrong. You are ignoring the possibility of an unlikely event. Although it is unlikely, they can both be 0 for all N trials as long as there is any possibility (i.e. neither p₁ or p₂ being 1).
If the random variables are independent, they have an actual correlation of 0. But even if they are independent, it is possible for a sample to have a correlation anywhere between -1 and 1, inclusive. As the sample size, N, gets large, the probability of sample correlations being far from 0 gets small. But it is always possible to get values anywhere between -1 and 1, inclusive.

Mentz114 · Feb 24, 2016

FactChecker said:

Maybe I am misunderstanding you, but I would say this is wrong. You are ignoring the possibility of an unlikely event. Although it is unlikely, they can both be 0 for all N trials as long as there is any possibility (i.e. neither p₁ or p₂ being 1).
If the random variables are independent, they have an actual correlation of 0. But even if they are independent, it is possible for a sample to have a correlation anywhere between -1 and 1, inclusive. As the sample size, N, gets large, the probability of sample correlations being far from 0 gets small. But it is always possible to get values anywhere between -1 and 1, inclusive.

Yes, this true. I don't think ##1-2(p_2-p_1)## is a limit (except asymptotically) because we have probabilties in the expression.

In fact the first expression I worked out was the multi-stream limit where the probabilities cancel. This is the same as the CHSH inequality which is reckoned to be a true limit.
Can my logic for ##|\mathcal{C}_{12}+\mathcal{C}_{23}+\mathcal{C}_{34}-\mathcal{C}_{41}| \leq 2## be saved because it has no probabilities in it ?

I think I'm assuming the same things as the derivation I've attached, which uses set logic.

Correlation limits for binary variates

Attachments

Related to Correlation limits for binary variates

1. What is a binary variate?

2. How is correlation calculated for binary variates?

3. What are the limitations of using correlation for binary variates?

4. How do correlation limits for binary variates differ from those for continuous variables?

5. In what situations would it be appropriate to use correlation for binary variates?

Similar threads

Hot Threads

Recent Insights