When not to use the Student t test?

Monique · May 11, 2007

A Student t test assumes normally distributed data with equal variances.
I know you can test the Gaussian distribution with the Kolmogorov and Smirnov test and test the variances with the F-test.

When data is not normal you use a non-parametric test (Mann-Whitney test), when variances are significantly different you use the Welch-corrected t test.

How strict should I follow those rules?
According to this site (http://www.graphpad.com/articles/interpret/Analyzing_two_groups/choos_anal_comp_two.htm ) the rules work well for >100 samples and works poorly for <12 samples. How about the region in between?

I have samples sets of n around 20, some are not normally distributed. Can I go ahead and do a t test, or should I maybe log transform all the data before doing the t test? Or do a Mann-Whitney test?

Thanks for your input, here is a graph with the data distribution for the 4 samples, together with the 95% CI:
http://img301.imageshack.us/img301/9940/scatter95cifg4.jpg

EnumaElish · May 11, 2007

The question of equal variances is easy: there is a variant of the t-test designed for unequal variances. For ex., proc ttest in SAS will produce one statistic under H₀: equal variances, and another statistic under unequal variances, and it will also test for equality of the variances.

A first "gut" reaction to the question of normality is, you should use both types of tests (parametric and non). If the results agree, no worry. You should think some more only if their results turn out differently from each other.

The data look as if a logarithmic transformation would do the trick, esp. for the 3rd and the 4th samples.

What I would have done is to estimate the linear regression Log(Y) = a + b₂ d₂ + ... + b₄ d₄ + ε, where d_i = 1 if Y is in the i'th sample (i = 1, 2, 3, 4), d_i = 0 otherwise; b's are the parameters to be estimated, and ε is the error term. Each b represents the difference between the mean of the i'th sample from the mean of the control sample. In this model, the first sample is made the control group by having been excluded from the regression, but one can easily change that. I'd first run this as an unweighted regression; alternatively I'd run a weighted regression to control for unequal variances (a problem technically known as heteroscedasticity.)

tacman · May 18, 2007

The Student t test is not appropriate to use when the data does not meet the assumptions of normality and equal variances. In this case, it would be more appropriate to use a non-parametric test such as the Mann-Whitney test. It is important to follow these rules because using a test that assumes normality and equal variances on non-normal data can lead to incorrect conclusions.

The strictness of these rules can vary depending on the size of your sample. As mentioned in the article you referenced, these rules work well for sample sizes larger than 100, but may not work as well for smaller sample sizes. In your case, with sample sizes around 20, it would be best to err on the side of caution and use a non-parametric test.

In terms of transforming your data, it is generally recommended to only transform data if it is necessary to meet the assumptions of the test being used. In this case, if your data is not normally distributed, it would be appropriate to use the Mann-Whitney test instead of transforming the data and using a t test.

Overall, it is important to carefully consider the assumptions of the test being used and choose the appropriate test for your data. In this case, the Mann-Whitney test would be the most appropriate choice for your sample sizes and non-normal data.

When not to use the Student t test?

Related to When not to use the Student t test?

1. When should I not use the Student t test?

2. Can I use the Student t test for non-parametric data?

3. Is the Student t test appropriate for comparing more than two groups?

4. Can I use the Student t test if the variances of the two groups are unequal?

5. Are there any other situations where the Student t test should not be used?

Similar threads

Hot Threads

Recent Insights