Correlation limits for binary variates

In summary: The probability of the union of the events is the sum of the probabilities. In summary, the conversation discusses the calculation of detector coincidences and the limits that apply to them. It also suggests using a test statistic to evaluate the hypothesis of coincidences and the possibility of hidden correlations. However, it is noted that the theoretical limits on correlations may not be a very interesting subject and there is a possibility for unlikely events to occur. Finally, the conversation mentions the CHSH inequality as a true limit for multi-stream coincidences.
  • #1
Mentz114
5,432
292
I've been looking at detector coincidences and tried to find what general limits apply to coincidences. I was surprised how simply the calculation works out. My question is whether it is correct and where can I find similar stuff ?

Consider two binary sequences produced by random processes where the probabilities of getting 1 are ##p_1## and ##p_2## respectively. Now assume that the number of 1's in the streams is ##1_n \rightarrow Np_n## as ##N \rightarrow \infty##.

If we know the counts ##1_n,\ 0_n=N-1_n## in our sequences then by a permutation argument it is clear that the maximum number of (0,0) coincidences can not be greater than the minimum of ##0_1=N(1-p_1)## and ##0_2=N(1-p_2)##. Similarly the maximum possible (1,1) coincidences is the least of ##Np_1## and ##Np_2##. Assumimg ##p_1<p_2## this gives a total for the (0,0) and (1,1) coincidences of ##S_{12}=N(1-p_2+p_1)##. There are no permutations which give a greater total than this.

The maximum possible correlation between the streams is given by ##\mathcal{C}_{12}=(2S_{12}-N)/N## which gives ##1-2(p_2-p_1)##.

From this one can write for the maximum possible correlations between 4 streams ( assuming ##p_1\leq p_2 \leq p_3 \leq p_4##).

##|\mathcal{C}_{12}+\mathcal{C}_{23}+\mathcal{C}_{34}-\mathcal{C}_{41}| \leq 2##

the ##p_n## terms conveniently cancelling.
 
Last edited:
Physics news on Phys.org
  • #2
Hey Mentz114.

I think it would be useful to define a test statistic that can be decided on regarding whether co-incidences exist and then to use that to evaluate the hypothesis of co-incidences.

If you can do that then you will have a far better chance of understanding and estimating this attribute in your random sample.

Strictly speaking the first thing to do would probably involve assessing the sample for independence and independence means that any conditional probability of any sort equals the probability of the original random variable (not that being conditioned on).

There are statistical tests to do this - and I think one involves chi-square.

http://www.stat.wmich.edu/s216/book/node112.html

Basically if correlation exists it can exist in many forms but the independence test is the first thing to ascertain evidence of whether hidden correlations may exist.

The other way is to partition the random variables and decompose them based on their correlation - something that happens in a Principal Component Analysis (or PCA). If information is independent then the decomposition should yield what was initially there to start off with and you won't be able to reduce the dimension of the system without significantly impacting its ability to capture variation.
 
  • #3
chiro said:
Hey Mentz114.

I think it would be useful to define a test statistic that can be decided on regarding whether co-incidences exist and then to use that to evaluate the hypothesis of co-incidences.
[..]
.
Chiro,

thanks for the reply. I think you might be misunderstanding what I'm doing. The theoretical limits on correlations is not ( it seems ) a very interesting subject but
it crops up, see here for instance Bell notes and wiki CHSH.
I could be in the wrong sub-forum ...
 
  • #4
Mentz114 said:
the maximum number of (0,0) coincidences can not be greater than the minimum of ##0_1=N(1-p_1)## and ##0_2=N(1-p_2)##.
Maybe I am misunderstanding you, but I would say this is wrong. You are ignoring the possibility of an unlikely event. Although it is unlikely, they can both be 0 for all N trials as long as there is any possibility (i.e. neither p1 or p2 being 1).
If the random variables are independent, they have an actual correlation of 0. But even if they are independent, it is possible for a sample to have a correlation anywhere between -1 and 1, inclusive. As the sample size, N, gets large, the probability of sample correlations being far from 0 gets small. But it is always possible to get values anywhere between -1 and 1, inclusive.
 
  • #5
FactChecker said:
Maybe I am misunderstanding you, but I would say this is wrong. You are ignoring the possibility of an unlikely event. Although it is unlikely, they can both be 0 for all N trials as long as there is any possibility (i.e. neither p1 or p2 being 1).
If the random variables are independent, they have an actual correlation of 0. But even if they are independent, it is possible for a sample to have a correlation anywhere between -1 and 1, inclusive. As the sample size, N, gets large, the probability of sample correlations being far from 0 gets small. But it is always possible to get values anywhere between -1 and 1, inclusive.
Yes, this true. I don't think ##1-2(p_2-p_1)## is a limit (except asymptotically) because we have probabilties in the expression.

In fact the first expression I worked out was the multi-stream limit where the probabilities cancel. This is the same as the CHSH inequality which is reckoned to be a true limit.
Can my logic for ##|\mathcal{C}_{12}+\mathcal{C}_{23}+\mathcal{C}_{34}-\mathcal{C}_{41}| \leq 2## be saved because it has no probabilities in it ?

I think I'm assuming the same things as the derivation I've attached, which uses set logic.
 

Attachments

  • Maccone-Bell-paper.pdf
    271.4 KB · Views: 814
Last edited:

Related to Correlation limits for binary variates

1. What is a binary variate?

A binary variate is a type of categorical variable that can only take two values, such as yes/no or 0/1. It is used to represent data that can be classified into two distinct groups or categories.

2. How is correlation calculated for binary variates?

Correlation for binary variates is typically calculated using a measure called phi coefficient or Pearson's phi, which measures the association or dependence between two binary variables. It ranges from -1 to 1, with 0 indicating no correlation and values closer to -1 or 1 indicating a strong negative or positive correlation, respectively.

3. What are the limitations of using correlation for binary variates?

One limitation is that correlation measures cannot determine causation, only association. Additionally, correlation measures may not accurately represent the relationship between two binary variables if there is a non-linear relationship or if there are unequal numbers of observations in each category.

4. How do correlation limits for binary variates differ from those for continuous variables?

Correlation limits for binary variates are different from those for continuous variables because binary variables can only take two values, whereas continuous variables can take on an infinite number of values. This means that the range of possible correlations for binary variates is limited compared to that for continuous variables.

5. In what situations would it be appropriate to use correlation for binary variates?

Correlation for binary variates is appropriate when the two variables being studied are both binary and there is a clear reason to examine their association. It can also be used as a preliminary analysis to determine if further research is needed to explore the relationship between the variables in more depth.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Quantum Interpretations and Foundations
2
Replies
54
Views
4K
  • Math Proof Training and Practice
6
Replies
175
Views
20K
  • Math Proof Training and Practice
2
Replies
61
Views
7K
  • Math Proof Training and Practice
3
Replies
71
Views
9K
  • Math Proof Training and Practice
4
Replies
107
Views
15K
  • Math Proof Training and Practice
2
Replies
43
Views
10K
  • Math Proof Training and Practice
4
Replies
105
Views
12K
  • Math Proof Training and Practice
2
Replies
67
Views
10K
  • STEM Academic Advising
Replies
13
Views
2K
Back
Top