Probability Value Question (left-tailed test)

Troy1 · Jul 4, 2017

I have a statistics problem that is probably not too difficult for someone who knows what they are doing, but I still need help with it. Here’s the scenario. There is a town with a highway built at the end of 2013. The mayor is concerned because of the number of traffic accidents on the highway each summer. The number of accidents were:
2014 – June - 341
July - 315
August - 352
September - 364
2015 – June - 346
July - 356
August - 361
September - 314
2016 – June - 345
July - 352
August - 363
September - 346
So in 2017, the mayor spends money on an ad campaign to get people to drive more slowly and cut down on traffic accidents. He puts the ads in a brand new local paper whose circulation is slowly rising.
At the end of summer (assuming there has actually been a drop in accidents) the mayor wants to know if the drop was enough to conclude that the ads were effective in lowering accidents, or if it was just a matter of chance. What formula would the mayor use to calculate probability of this for all four months in question, or (in the event that the newspaper circulation was not big enough until the end, and he only sees results then), just for the final month of the trial?

Thank you so much if you can help. I have looked online and in a statistics book, but haven't found an example exactly like what I am looking for. It is driving me crazy.

I like Serena · Jul 4, 2017

Troy said:

I have a statistics problem that is probably not too difficult for someone who knows what they are doing, but I still need help with it. Here’s the scenario. There is a town with a highway built at the end of 2013. The mayor is concerned because of the number of traffic accidents on the highway each summer. The number of accidents were:
2014 – June - 341
July - 315
August - 352
September - 364
2015 – June - 346
July - 356
August - 361
September - 314
2016 – June - 345
July - 352
August - 363
September - 346
So in 2017, the mayor spends money on an ad campaign to get people to drive more slowly and cut down on traffic accidents. He puts the ads in a brand new local paper whose circulation is slowly rising.
At the end of summer (assuming there has actually been a drop in accidents) the mayor wants to know if the drop was enough to conclude that the ads were effective in lowering accidents, or if it was just a matter of chance. What formula would the mayor use to calculate probability of this for all four months in question, or (in the event that the newspaper circulation was not big enough until the end, and he only sees results then), just for the final month of the trial?

Thank you so much if you can help. I have looked online and in a statistics book, but haven't found an example exactly like what I am looking for. It is driving me crazy.

Hi Troy, welcome to MHB! (Smile)

The mayor would like to know if the traffic accidents after his campaign are significantly lower than before.
We could model the traffic accidents since 2013 as a normal distribution, and estimate its average and standard deviation based on the numbers up to 2017.
The formula would be
$$z = \frac{x - \mu_0}{\sigma}$$
where $x$ is the number of traffic accidents immediately after his campaign in 2017, $\mu_0$ is the average of the years 2013-2016, and $\sigma$ is the standard deviation of the years 2013-2016.
If $z$ is below $-1.645$ we can conclude that traffic accidents were significantly lower with a 5% confidence level.

Troy1 · Jul 7, 2017

Okay, I am still having trouble with this. For the mean, I get 346.25. For the standard deviation, I get 15.86. So if I use the formula provided, an I plug in the number 250 for the new monthly total of traffic accidents, then subtract the mean and divide by the standard deviation, I get (negative) 6.06. If I plug in a new traffic accident number of 200, I get (negative) 9.22. Any reduction in monthly accidents would give us a negative number, so the fact that it is negative can't mean anything in itself. When the new total per month drops, the difference between the new number and the mean grows, meaning the quotient after being divided by the standard deviation gets bigger too as an absolute number. I have looked at it many times and can't see what I am doing wrong. It is me, or is this formula somehow incorrect? Thanks the those math whizzes out there.

I like Serena · Jul 7, 2017

Troy said:

Okay, I am still having trouble with this. For the mean, I get 346.25. For the standard deviation, I get 15.86. So if I use the formula provided, an I plug in the number 250 for the new monthly total of traffic accidents, then subtract the mean and divide by the standard deviation, I get (negative) 6.06. If I plug in a new traffic accident number of 200, I get (negative) 9.22. Any reduction in monthly accidents would give us a negative number, so the fact that it is negative can't mean anything in itself. When the new total per month drops, the difference between the new number and the mean grows, meaning the quotient after being divided by the standard deviation gets bigger too as an absolute number. I have looked at it many times and can't see what I am doing wrong. It is me, or is this formula somehow incorrect? Thanks the those math whizzes out there.

Indeed. It is not sufficient that $z$ is negative.
We need that $z<-1.645$, which is the critical $z$-value for a left-tailed test with a confidence level of 5%.
In both of your examples that is the case.

Consider a reduction to, say, 320 traffic accidents per month.
Would that be significant?

Troy1 · Jul 7, 2017

Thank you, I think I see it now. So 320 accidents would give a z score of -1.655, making it the threshold for being a significant result. This would be true even though in two of the twelve months under consideration from the past three years the number of accidents was lower than this (314 and 315)? How could the mayor explain to people that the results were not just do to random fluctuation, when the number had dipped so low before? This 320 number would be right on the edge, a p value of 0.05, right? This might not convince many people. But suppose the number was much lower, in the 310s, 300s, 290s, or lower. How can we convert that into a p-value to show people how unlikely it is that the result is due to random chance. Also, I used to be under the impression that p-value was the same as a percent chance, but in a book on statistics, I was perusing last week, the author said this is not so. I am still not clear on this point.

I like Serena · Jul 7, 2017

Troy said:

Thank you, I think I see it now. So 320 accidents would give a z score of -1.655, making it the threshold for being a significant result. This would be true even though in two of the twelve months under consideration from the past three years the number of accidents was lower than this (314 and 315)? How could the mayor explain to people that the results were not just do to random fluctuation, when the number had dipped so low before? This 320 number would be right on the edge, a p value of 0.05, right? This might not convince many people. But suppose the number was much lower, in the 310s, 300s, 290s, or lower. How can we convert that into a p-value to show people how unlikely it is that the result is due to random chance. Also, I used to be under the impression that p-value was the same as a percent chance, but in a book on statistics, I was perusing last week, the author said this is not so. I am still not clear on this point.

Yes, if we have 320 accidents it might be a fluke.
Still, we have statistical proof that there is a significant reduction.
We are 95% sure that it helps.
How can the people NOT believe that it is beneficial?
We're talking about saving dozens of people's lives here!
And anyway, we'll see the next month what happens, and either gain more confidence - or less.
Do note that no statistical proof is ever 100% sure - that's why it's statistics.

Note that there's a common pitfall, that when it's not significant, the mayor will try something else the next month, and something else again the month after. Then sooner or later, it will seem to be significant (after an expected 20 attempts). In such cases we have to make corrections.

If we have a lower number, we can determine the corresponding p-value based on the z-score that we calculate.
Any statistical calculator can do that. For instance a z-score of -2 corresponds to p=0.02.
Typically the p-value is mentioned to support the (black and white) conclusion in any statistical analysis.

The p-value is the (percentage) chance that we're wrong to say there's a reduction in traffic accidents.
Do note that we have to be careful with our units: 1% equals 0.01.
So a p-value of 0.01 is the same as a p-value of 1%.

Troy1 · Jul 7, 2017

This is great. Finally I think I understand it. Thank you so much. One last question. I see some standard distribution table online, but how would I be able to look at one of them when setting up this problem and know what number exactly the z-score had to be to pass muster?

I like Serena · Jul 8, 2017

Troy said:

This is great. Finally I think I understand it. Thank you so much. One last question. I see some standard distribution table online, but how would I be able to look at one of them when setting up this problem and know what number exactly the z-score had to be to pass muster?

A z-table has the z-scores in the leftmost column, and an additional digit for the z-score in the top most row.
Together they identify the p-value that is in the table itself.

Conversely, if we know we want to have a confidence level of 5%, we can search the content of the table for 0.05 (or 0.95) and find the corresponding z-score in the leftmost column combined with the topmost row.
For 0.05 (or 0.95) we should find 1.645.

Troy1 · Jul 8, 2017

Thank you. I think I see now how you arrived at that number. Actually I thought of another question. If the number we come up with is only 320 for the final month, perhaps because of low newspaper circulation or some other plausible explanation, we get a significant value. But is the number were 320 for all four months in question, it would definitely be more convincing proof. How would the added strength of this evidence be reflected in the math, or would it be?

I like Serena · Jul 8, 2017

Troy said:

Thank you. I think I see now how you arrived at that number. Actually I thought of another question. If the number we come up with is only 320 for the final month, perhaps because of low newspaper circulation or some other plausible explanation, we get a significant value. But is the number were 320 for all four months in question, it would definitely be more convincing proof. How would the added strength of this evidence be reflected in the math, or would it be?

When we have more than 1 measurement, we'd do a slightly different test: the student's t-test.
In particular we'd compare the mean after with the mean before, and divide by the so called standard error SE.
The corresponding formula (leaving out the detail how to calculate SE) is:
$$t=\frac{\bar x_{after} - \bar x_{before}}{SE}$$
The t-score is similar to the z-score. We just use a different table.
With more data we should find a lower (more significant) p-value for the same reduction in traffic accidents.

Troy1 · Jul 8, 2017

You have been an amazing help. I will recommend your site to everyone who I think might possibly be interested. Thank you so much! I will be back someday when I can't solve a math problem and don't know where to turn!

Probability Value Question (left-tailed test)

What is a probability value question in a left-tailed test?

How is a probability value calculated in a left-tailed test?

What does a probability value in a left-tailed test indicate?

How is a probability value used in hypothesis testing?

What are some common misconceptions about probability values in left-tailed tests?

Similar threads

Hot Threads

Recent Insights