Perform a significance test

mathmari

Well-known member
MHB Site Helper
Hey!!

In a study at $15$ children at the age of $10$ years the number of hours of television watching per week and the pounds above or below the ideal body weight were determined (high positive values ​​= overweight).

1. Determine the simple linear regression equation by considering the weights above the ideal body weight as a dependent variable.
2. Perform a significance test for the slope of the regression line at significance level $\alpha = 5\%$ (using p-values).
3. Perform a significance test of the criterion F at significance level $\alpha = 0.05$ (using p-values).
4. Determine the confidence interval for the average weight in pounds for a child who watches television for $36$ hours a week and for a child who watches television for $30$ hours a week. Which confidence interval is greater and why?

I have done the following:

1. At the beginning I calculated the following:

Using these information we get:
\begin{align*}&\nu =15 \\ &\overline{X}=\frac{\sum X}{\nu}=\frac{472}{15}=31.47 \\ &\overline{Y}=\frac{\sum Y}{\nu}=\frac{86}{15}=5.73 \\ &\hat{\beta}=\frac{\nu \sum \left (XY\right )-\left (\sum X\right )\left (\sum Y\right )}{\nu\sum X^2-\left (\sum X\right )^2}=\frac{15 \cdot 3356-472\cdot 86}{15\cdot 15524-472^2}=\frac{50340-40592}{232860-222784}=\frac{9748}{10076}=0.97 \\ & \hat{\alpha}=\overline{Y}-\hat{\beta}\cdot \overline{X}=5.73-0.97\cdot 31.47=5.73-30.5259=-24.80\end{align*}

Therefore the linear regression equation with dependent variable the kg over the ideal weights is: \begin{equation*}\hat{Y}=0.97X-24.80\end{equation*}

The graph looks as follows:

2. We want to test the null hypothesis that the slope of the regression line is $0$.

I found some notes and according to these I did the following:

Since p-value < α (or |t| > t-crit) we reject the null hypothesis, and so we can’t conclude that the population slope is zero.

Is this correct?

But, according to these calculations we get an other slope than I got in the first question, or not? Here we have $b=0,91$ and in the first question I got $\hat{\beta}=0,97$.
So have I done something wrong at the calculation of the linear regression equation?

Klaas van Aarsen

MHB Seeker
Staff member
[*] We want to test the null hypothesis that the slope of the regression line is $0$.
Hey mathmari !!

Let's rephrase that... we want to test the alternative hypothesis that the slope of the regression line is not $0$.

I found some notes and according to these I did the following:

Since p-value < α (or |t| > t-crit) we reject the null hypothesis, and so we can’t conclude that the population slope is zero.

Is this correct?
Since we have a 2-sided test we need to compare the p-value with α/2.
If it is below - and see below for an apparent calculation mistake - then we conclude that the slope is significantly different from zero.
Or put otherwise, that there is a significant linear correlation between X and Y.
Note that we can never conclude that the population slope is 0. At best we do not have sufficient information to conclude that it is different.

But, according to these calculations we get an other slope than I got in the first question, or not? Here we have $b=0,91$ and in the first question I got $\hat{\beta}=0,97$.
So have I done something wrong at the calculation of the linear regression equation?
Looks as if there is a mistake.
I get different values for s_X and s_Y. I have s_X=6.92 and s_Y=7.648.
Perhaps the excel range was not set correctly?

mathmari

Well-known member
MHB Site Helper
Since we have a 2-sided test we need to compare the p-value with α/2.
If it is below - and see below for an apparent calculation mistake - then we conclude that the slope is significantly different from zero.
Or put otherwise, that there is a significant linear correlation between X and Y.
Note that we can never conclude that the population slope is 0. At best we do not have sufficient information to conclude that it is different.
So do we have the following?

Since p-value < α/2 (or |t| > t-crit) we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.

Looks as if there is a mistake.
I get different values for s_X and s_Y. I have s_X=6.92 and s_Y=7.648.
Perhaps the range was not set correctly?
Ah yes, I found my mistake at the commands at Excel.

Now I get:

So now it is the same slope as I found in the first question!

Klaas van Aarsen

MHB Seeker
Staff member
So do we have the following?

Since p-value < α/2 (or |t| > t-crit) we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.
I've just noticed that you've used [M]=TDIST(x, df, tails=2)[/M] to calculate the p-value. If I'm not mistaken it means that the factor 2 has already been taken care of so that we can compare the p-value and α directly.

And yes, we conclude that the slope is significantly different from zero.

Ah yes, I found my mistake at the commands at Excel.
Now I get:
So now it is the same slope as I found in the first question!
Good!

mathmari

Well-known member
MHB Site Helper
I've just noticed that you've used [M]=TDIST(x, df, tails=2)[/M] to calculate the p-value. If I'm not mistaken it means that the factor 2 has already been taken care of so that we can compare the p-value and α directly.

And yes, we conclude that the slope is significantly different from zero.

Good!

Great!!

Could you give me a hint for the question 3? What exactly is the criterion F?

Klaas van Aarsen

MHB Seeker
Staff member
Great!!

Could you give me a hint for the question 3? What exactly is the criterion F?
You have just executed a t-test to test whether the slope is different from 0.
As I understand it, we can also do an F-test for the same thing.
An F-test tests whether 2 variances are different. The F-value is the ratio between those 2 variances.

mathmari

Well-known member
MHB Site Helper
You have just executed a t-test to test whether the slope is different from 0.
As I understand it, we can also do an F-test for the same thing.
An F-test tests whether 2 variances are different. The F-value is the ratio between those 2 variances.
I used in Excel the "F-Test for the variances of two samples" and I got the following:

Is this correct, i.e. did I give the correct inputs?

Klaas van Aarsen

MHB Seeker
Staff member
I used in Excel the "F-Test for the variances of two samples" and I got the following:

Is this correct, i.e. did I give the correct inputs?
I don't think so.
It appears you have compared the variances of the inputs and the outputs.
But that does not really say whether they are correlated or not does it?

Perhaps we should search for what kind of F-test we can do within the context of a linear regression.
It should compare the 'explained' variance with the 'unexplained' variance.

mathmari

Well-known member
MHB Site Helper
Perhaps we should search for what kind of F-test we can do within the context of a linear regression.
It should compare the 'explained' variance with the 'unexplained' variance.

The explained variance is the sum of the squared of the differences between each predicted Y-value and the mean of Y.

The unexplained variance is the sum of the squared of the differences between the Y-value of each ordered pair and each corresponding predicted Y-value.

Right?

Is the F-value the fraction of these two values?

If yes, then we have the following:
\begin{equation*}F=\frac{\text{explained variance}}{\text{unexplained variance}}=\frac{632.0347}{190.2276}=3.32252\end{equation*}

mathmari

Well-known member
MHB Site Helper
Oh sorry, I forgot to upload the table:

Klaas van Aarsen

MHB Seeker
Staff member
The explained variance is the sum of the squared of the differences between each predicted Y-value and the mean of Y.

The unexplained variance is the sum of the squared of the differences between the Y-value of each ordered pair and each corresponding predicted Y-value.

Right?
Those are the sum-squared values, typically abbreviated as SSM and SSE.
To find the variances we still need to divide by the corresponding degrees-of-freedom (DFM and DFE) don't we?

Is the F-value the fraction of these two values?

If yes, then we have the following:
\begin{equation*}F=\frac{\text{explained variance}}{\text{unexplained variance}}=\frac{632.0347}{190.2276}=3.32252\end{equation*}
Yes, the F-value is that fraction.
But I think the numbers for the variances are not correct yet.

mathmari

Well-known member
MHB Site Helper
Those are the sum-squared values, typically abbreviated as SSM and SSE.
To find the variances we still need to divide by the corresponding degrees-of-freedom (DFM and DFE) don't we?

Yes, the F-value is that fraction.
But I think the numbers for the variances are not correct yet.
Oh ok!

So we have that DFM = p - 1, where p is the number of regression parameters, which is 2 in this case, and so we get DFM = 2-1=1, or not?

We also have that DFE = n - p, where n is the number of observations, and so we get DFE = 15 - 2 =13, or not?

Klaas van Aarsen

MHB Seeker
Staff member
Oh ok!

So we have that DFM = p - 1, where p is the number of regression parameters, which is 2 in this case, and so we get DFM = 2-1=1, or not?

We also have that DFE = n - p, where n is the number of observations, and so we get DFE = 15 - 2 =13, or not?
Yep.

mathmari

Well-known member
MHB Site Helper
So using the table of post #10 we get
\begin{align*}&SSM=632.0347 \\ &DFM=2-1=1 \\ &SSE=190.2276 \\ &DFE=15-2=13 \\ &MSM=\frac{SSM}{DFM}=\frac{632.0347}{1}=632.0347 \\ &MSE=\frac{SSE}{SFE}=\frac{190.2276}{13}=14.6329 \\ &F=\frac{MSM}{MSE}=\frac{632.0347}{14.6329}=43.1927\end{align*}

Now we have to find the confidence interval for the test statistic with $\alpha=0.05$, right? We look in the F-table at the $0.05$ entry for $1$ df in the numerator and $13$ df in the denominator.

Using the R program and compiling the function qf(0.95, 1, 13) we get 4.667193.

Is so far everything correct?

How is the confidence interval defined with these data?

Klaas van Aarsen

MHB Seeker
Staff member
So using the table of post #10 we get
\begin{align*}&SSM=632.0347 \\ &DFM=2-1=1 \\ &SSE=190.2276 \\ &DFE=15-2=13 \\ &MSM=\frac{SSM}{DFM}=\frac{632.0347}{1}=632.0347 \\ &MSE=\frac{SSE}{SFE}=\frac{190.2276}{13}=14.6329 \\ &F=\frac{MSM}{MSE}=\frac{632.0347}{14.6329}=43.1927\end{align*}

Now we have to find the confidence interval for the test statistic with $\alpha=0.05$, right? We look in the F-table at the $0.05$ entry for $1$ df in the numerator and $13$ df in the denominator.

Using the R program and compiling the function qf(0.95, 1, 13) we get 4.667193.

Is so far everything correct?
I have found the F-value 42.967. That is more or less the same F-value. Good.
The difference is probably caused by early rounding.

And you have found a critical F-value.
But shouldn't we find a p-value to compare with $\alpha$? And draw a conclusion?

How is the confidence interval defined with these data?
For the F-test you mean?
The F-test is a 1-sided test in this case, and generally a confidence interval belongs to a 2-sided test.
So I don't think we should calculate a confidence interval in this case.

mathmari

Well-known member
MHB Site Helper
I have found the F-value 42.967. That is more or less the same F-value. Good.
The difference is probably caused by early rounding.

And you have found a critical F-value.
But shouldn't we find a p-value to compare with $\alpha$? And draw a conclusion?
So shouldn't I have calculated that F value? How do we calculate the p value?

Klaas van Aarsen

MHB Seeker
Staff member
So shouldn't I have calculated that F value? How do we calculate the p value?
You found a formula in R to calculate the critical F-value from $\alpha$.
Isn't there a simular formula to calculate the p-value from the F-value?

mathmari

Well-known member
MHB Site Helper
You found a formula in R to calculate the critical F-value from $\alpha$.
Isn't there a simular formula to calculate the p-value from the F-value?
Using the function pf(42.967, 1, 13, lower.tail=F) we get 1.839458e-05.

Is the function correct?

Klaas van Aarsen

MHB Seeker
Staff member
Using the function pf(42.967, 1, 13, lower.tail=F) we get 1.839458e-05.

Is the function correct?
Yep.
Previously you used the t-test to find the p-value for the slope. Now we used the F-test. The result should be the same shouldn't it? Is it?

mathmari

Well-known member
MHB Site Helper
Yep.
Previously you used the t-test to find the p-value for the slope. Now we used the F-test. The result should be the same shouldn't it? Is it?
Ah yes, they are the same!

So we compare now the p-value ith $\alpha$, or not? Sodo we have the following?

Since p-value < α we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.

As for the question 4, how is the confidence interval defined, which formula do we use?

Last edited:

Klaas van Aarsen

MHB Seeker
Staff member
Ah yes, they are the same!

So we compare now the p-value ith $\alpha$, or not? Sodo we have the following?

Since p-value < α we reject the null hypothesis, and so we conclude that the slope is significantly different from zero.
Yep.

As for the question 4, how is the confidence interval defined, which formula do we use?
We are looking for the confidence interval of a point estimate in a simple linear regression.
I found a formula here, here and here.
Wikipedia gives a confidence band formula for the same thing.

mathmari

Well-known member
MHB Site Helper
We are looking for the confidence interval of a point estimate in a simple linear regression.
I found a formula here, here and here.
Wikipedia gives a confidence band formula for the same thing.
So do we have the following?

That would mean that the confidence interval is $[7.541854251, \ 12.69633551]$.

Is that correct?

Klaas van Aarsen

MHB Seeker
Staff member
So do we have the following?

That would mean that the confidence interval is $[7.541854251, \ 12.69633551]$.

Is that correct?
I didn't check the numbers, but the approach seems to be correct.
Still, didn't the question ask for a child who watches television for 30 hours a week as well? And the corresponding confidence interval?

mathmari

Well-known member
MHB Site Helper
I didn't check the numbers, but the approach seems to be correct.
Still, didn't the question ask for a child who watches television for 30 hours a week as well? And the corresponding confidence interval?
For that we do the same just replacing the 36 hours by 30 hours, or not?

Klaas van Aarsen

MHB Seeker
Staff member
For that we do the same just replacing the 36 hours by 30 hours, or not?
I guess so, assuming your previous approach was correct which seems plausible.