Question on Pearson's Chi-squared test

In summary: Yes, you are misunderstanding something. The assumption is that the variance of the random variables is equal to their expected value. This is required for the Chi-squared distribution to exist.
  • #1
mnb96
715
5
Hello,

I was trying to interpret the formula of Pearson's Chi-squared test:
[tex]\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}[/tex]

I thought that if we assume that each [itex]O_i[/itex] is an observation of the random variable [itex]X_i[/itex], then the above formula essentially considers the sum-of-squares of n standardized random variables [itex]Y_i=\frac{X_i-\mu_i}{\sigma_i}[/itex]. In fact, if such random variables are [itex]Y_i \sim N(0,1)[/itex], then the random variable [itex]S = \sum_{i=1}^n Y_i^2[/itex] follows a [itex]\chi^2[/itex]-distribution. Thus, the formula of the Chi-squared test would essentially evaluate the probability [itex]\mathrm{P}\left( S = \chi^2 \right)[/itex], and of course compare it to some chosen P-value.

My question is about the standardization of the random variables [itex]X_i[/itex].
If my interpretation above is correct, then Pearson's Chi-squared test somehow assumes that each random variable [itex]X_i[/itex] has variance equal to its expected value, that is: [tex]\sigma_i^2 = \mu_i[/tex]

Why so?
Can anybody explain why we would need to assume that variance and expected values are numerically equal? That condition is satisfied only for some distributions like Poisson and Gamma (with [itex]\theta=1)[/itex]. Why such a restriction?
 
Last edited:
Physics news on Phys.org
  • #2
mnb96 said:
if we assume that each [itex]O_i[/itex] is an observation of the random variable [itex]X_i[/itex]

The [itex] O_i [/itex] are supposed to be a count of how many observations of a random variable fall within a "cell". How are you are defining the [itex] i[/itex]th cell?
 
  • #3
Stephen Tashi said:
The [itex] O_i [/itex] are supposed to be a count of how many observations of a random variable fall within a "cell".

I see! That is an important observation. It probably means that the random variables [itex] X_i [/itex] are supposed to follow a multinomial distribution.

For instance, if we have only one cell, then [itex]X_1[/itex] could be the amount of successes out of [itex]m[/itex] independent trials of some experiment. Thus, [itex] X_1 [/itex] would follow a binomial distribution, which in fact approaches a Poisson distribution for [itex]m[/itex] large, and which has [itex]\sigma^2=\mu=\lambda[/itex].

If the above reasoning is correct, then Pearson's Chi-squared test should work only when the number of trials is sufficiently large.
 
  • #4
mnb96 said:
It probably means that the random variables [itex] X_i [/itex] are supposed to follow a multinomial distribution.

I'm not sure what you mean by that statement.

The test can be applied to repeated independent samples of a single random variable. The single random variable can have any distribution. It is only necessary to define the cells so that they partition the range of the random variable.
 
  • Like
Likes 1 person
  • #5
Hi Stephen, and thanks for your help!

What I meant, is that [itex]X_i[/itex] is a random variable that "counts" the number of observations that happened to fall into the i-th cell. For instance, if we consider a continuous random variable Z having some unknown probability density function, and we partition the real line into two cells corresponding to the events: [itex]Z\geq 10[/itex] (=success) and [itex]Z< 10[/itex] (=failure), then the two events will have probabilities p and (1-p).

We can sample the random variable Z many times, say n times.
Now, [itex]X_1[/itex] is the random variable that keeps the total counts of successes, thus [itex]X_1[/itex] follows a binomial distribution, i.e. [itex]X_1\sim B(n,p)[/itex].

I thought that if we extend this reasoning to [itex]k[/itex] cells, then the vector of random variables [itex](X_1,\ldots,X_k)[/itex] should follow a multinomial distribution, i.e. [itex](X_1,\ldots,X_k) \sim M(n;p_1,\ldots,p_k)[/itex].

Or am I misunderstanding something?
 

Related to Question on Pearson's Chi-squared test

1. What is Pearson's Chi-squared test?

Pearson's Chi-squared test is a statistical test used to determine whether there is a significant association between two categorical variables. It is often used to analyze data in contingency tables, where the observed frequencies are compared to the expected frequencies under a null hypothesis of independence.

2. What is the purpose of using Pearson's Chi-squared test?

The purpose of using Pearson's Chi-squared test is to determine whether there is a significant relationship between two categorical variables. This can help researchers to understand the underlying patterns and associations in their data.

3. How do you interpret the results of a Pearson's Chi-squared test?

The results of a Pearson's Chi-squared test are typically presented as a p-value. If the p-value is less than the chosen significance level (usually 0.05), then there is evidence to reject the null hypothesis of independence and conclude that there is a significant relationship between the variables. If the p-value is greater than the significance level, then there is not enough evidence to reject the null hypothesis.

4. What are the assumptions of Pearson's Chi-squared test?

The main assumptions of Pearson's Chi-squared test are that the data is collected randomly, the sample size is large enough, and the expected frequencies in each category are at least 5. If these assumptions are violated, the results of the test may not be reliable.

5. Can Pearson's Chi-squared test be used for continuous variables?

No, Pearson's Chi-squared test is only suitable for analyzing categorical variables. For continuous variables, other statistical tests such as t-tests or ANOVA should be used.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
855
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
2K
Back
Top