Basic Statistics question: which test to run?

In summary: If you want to estimate the 95% confidence interval for this value, we would use the z-test. This is because the z-test is a test of proportions, which is exactly what we are trying to do here. In other words, we are trying to determine if the mean wpm of the first half of your experiment is statistically different from the mean wpm of the second half of your experiment. Assuming that you have the data in an Excel spreadsheet, you could do this test easily by using the t-test. All you need to do is find the mean wpm of the first half of your data, and the mean wpm of the second half of your data, and then
  • #1
james121515
4
0
Hi everyone!

Lately I have been trying to improve my typing speed, and have been playing a game called Type Race, where you type various short passages (the passages are selected at random from a text bank) and your score in WPM is recorded. What I want to do is determine whether or not my typing speed has improved as a result of playing the game. In particular, I want to test the hypothesis that my scores have increased since I first started playing, statistically speaking. Right now I have completed 2821 races, with mean of 102.4 wpm and a standard deviation of 11.14 wpm. I have all the results saved in an Excel Spreadsheet I graphed my results with a histogram and noticed that my “population” of scores is quite normal. So, given this model of what I want to, what would be the strongest significance test to run on this data? With my very limited knowledge of statistics, I was thinking I could two random samples, one from the first half of my scores, and another from the second half, and testing the hypothesis that there will be a significant increase in wpm in minute. (In other words, the mean of the first half is larger than the second half). However, I think that there are some issues with this, since the initial scores have a direct influence on the later scores due to improvement from practice. In other words, the post “treatment” results were a direct result of having completing the “pre treatment” results in the first place. Sorry if that made NO SENSE whatsoever, again I’m a statistics newbie. What I’m getting at is, if this is case, wouldn’t it be more appropriate to run some kind of test that takes the pooled variance into account? Granted I would prefer to avoid analysis of variance if possible, but again I’m looking to obtain the most statistically robust results possible.

Thanks for any help!
 
Physics news on Phys.org
  • #2
Was this not the appropriate forum to post this thread in? This is not a homework problem.
 
  • #3
Applying statistics is a subjective matter and it is heavily influenced by tradition. (It sounds like you aren't worried about publishing a paper about your experiment, but if you are, it is advisable to look a other papers that got published and see what sort of statistical techniques impressed the editors of the publications.)

You didn't mentioned whether you had plotted your scores as function of either time or the number of the practice sesssion ( first, second, third, etc.). That's where I would start.

You say that you are a statistics newbee, so I don't know whether you understand the technical defintions of terms like "significance", "confidence" etc. The definitions of those concepts are not simple and they are not what a common sense type of person wants to know about real world situations! Is your purpose to apply statistics to the data in order to learn more about statistics? Or is your purpose to find out something about the data? If so, what? (If you are going to say "I want to know if my improvement is statistically significant", then please explain what you understand "statistically significant" to men.)
 
  • #4
Thanks for your response. What I’m doing here is a little project on my own to help me understand statistics at a deeper level (beyond the one freshman level intro class I took two years ago), while at the same time having some fun with it. You mentioned confidence. I want to run this test at a 95% confidence level. Broadly speaking, I want there to only be a 5% chance that any improvement (increase in wpm) was by chance, and chance alone.
I suppose the question now is, given the data I have, what is the most appropriate way to measure “improvement” (or lack thereof)? Again, back to my original idea of taking a sample from the first half of the data, and another from the second half, and then running an independent samples t-test to determine whether or not the increase increase in mean WPM observed was statistically significant (with a CL of .95). My question is whether or not there are things I have not taken into account, which might affect the validity of my model.

Thanks again,
James
 
  • #5
james121515 said:
I want to run this test at a 95% confidence level. Broadly speaking, I want there to only be a 5% chance that any improvement (increase in wpm) was by chance, and chance alone.

The term "confidence" applies when you desire to estimate a parameter. Suppose we take "wpm" to mean the mean wpm of the imagined distribution of all possible typing exercises that you could do ( or do in the first half of your training, or however you wish to restrict the population). If you assume your wpm's are normally distributed and you have a certain number of samples from the population, you can state the radius of a 95% "confidence interval". The radius will be given in wpm and the "confidence" will be that if you repeated the experiment of collecting the same number of samples over and over again then the probability that the sample mean will be with in plus or minus that radius of population mean is 0.95. ( However if you take the particular wpm from your data, like 120.3 wpm you cannot claim that there is a 0.95 probability that this particular sample mean is within plus or minus that radius of the true population mean. That's the rub with using "confidence intervals".)

The statement that there "only be a 5% chance that any improvement (increase in wpm) was by chance, and chance alone" sounds like you wish to do "hypothesis testing". (The 5% is a "significance level" or "critical p-value", not a "confidence" level ). You could assume two samples (like your earlier vs later typing exercises) are drawn from the same normal population and you compute the probability of the difference in the means of the later minus the earlier sample being equal or gretater than the difference you observed. That does tell you whether there was a 5% (or less) probability than the observed result happened by chance, on the assumption that the two distributions are the same. (There is no question about the distributions being the same or not in this calculation. The phrase "by chance alone" isn't tested. It is assumed. There is no allowance for anything but chance. The distributions are assumed to be the same - or, if you wish, they are assumed the same with proability 1. Hence the 5% mentioned in the definition of the test does not tell you that a given outcome implies that is only a 5% probability that the distributions are the same or that there is a 95% probability they are different etc. That's the rub with hypothesis testing. It tells you nothing about the probability that the hypothesis is true or false. It only tells you about the probability of the data given the truth of the hypothesis - not vice versa.)

The above are descriptions of "frequentist" statistics, which is the type usually taught in introductory courses. My personal preference is to use Bayesian statistics. I also prefer to use probability models and simulations as opposed to relying on statistics alone.
 
Last edited:
  • #6
What you want is a 2-sample t-test, which is used when you don't know the population standard deviation but you are sure that the data has a normal distribution or the sample is large. If the data is pooled - meaning there's no difference between the variances - use the pooled 2-sample t-test. If not, use the non-pooled 2-sample t-test.
 
Last edited:

Related to Basic Statistics question: which test to run?

1. What is the purpose of basic statistics?

Basic statistics is used to organize, analyze, and interpret data in order to make informed decisions or draw conclusions about a particular population or phenomenon. It helps to summarize and describe data, identify patterns and relationships, and make predictions based on the data.

2. What are the different types of statistical tests?

There are many different types of statistical tests, but some of the most common include t-tests, ANOVA, correlation analysis, and regression analysis. Each type of test is used to answer different types of research questions and analyze different types of data.

3. How do I know which statistical test to use?

The type of statistical test to use depends on several factors, including the research question, the type of data being analyzed, and the number of groups or variables involved. It is important to carefully consider these factors and consult with a statistician or use a statistical software program to determine the most appropriate test for your data.

4. What are some common mistakes when performing statistical tests?

Some common mistakes when performing statistical tests include using the wrong type of test for the data, not checking assumptions, and not properly interpreting the results. It is important to carefully plan and execute the statistical analysis, and to thoroughly review and understand the results in order to avoid these mistakes.

5. How can I ensure the accuracy of my statistical analysis?

To ensure the accuracy of your statistical analysis, it is important to carefully plan and execute the study, use appropriate statistical tests, check assumptions, and thoroughly review and understand the results. It is also helpful to consult with a statistician or use a statistical software program to verify the accuracy of the analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
894
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
588
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
953
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Calculus and Beyond Homework Help
Replies
7
Views
880
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
Back
Top