# [SOLVED]Heights of the employees of a company

#### mathmari

##### Well-known member
MHB Site Helper
Hey!!

In the following table there are the heights of the employees of a company:

1. Calculate the mean value, the variance and the standard deviation of the heights of the employees.
2. Determine the distribution of sampling with replacement of average number of children of each employee for sample size $2$.
3. Which is the mean value and the variance of the sampling average?
4. Which is the mean value of the sampling variances?
5. Which is the mean value of the sampling standard deviations?

I have done the following:
1. The mean value is equal to $\frac{1436}{8}=179.5$.

The variance is equal to $\displaystyle{s^2=\frac{\sum_{i=1}^n\left (x_i-\overline{x}\right )^2}{n}}$.

The standard deviation is equal to $\displaystyle{s=\sqrt{108.5}=10.42}$.

2. I don't really understand this question. How is this related to the given data? Is there maybe a typo and instead of number of children it should be the height? But even in this case, I don't understand what we have to do here. Do you have an idea?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
In the following table there are the heights of the employees of a company:
1. Calculate the mean value, the variance and the standard deviation of the heights of the employees.
2. Determine the distribution of sampling with replacement of average number of children of each employee for sample size $2$.
3. Which is the mean value and the variance of the sampling average?
4. Which is the mean value of the sampling variances?
5. Which is the mean value of the sampling standard deviations?
I have done the following:
1. The mean value is equal to $\frac{1436}{8}=179.5$.

The variance is equal to $\displaystyle{s^2=\frac{\sum_{i=1}^n\left (x_i-\overline{x}\right )^2}{n}}$.

The standard deviation is equal to $\displaystyle{s=\sqrt{108.5}=10.42}$.
2. I don't really understand this question. How is this related to the given data? Is there maybe a typo and instead of number of children it should be the height? But even in this case, I don't understand what we have to do here. Do you have an idea?
Hey mathmari !!

Yes. It looks like a copy-paste mistake. It should be height instead of number of children.

It looks like an exercise about the difference between populations, samples, and the probability distribution when repeatedly drawing a sample (with replacement) from a population.

In this case the complete population exists of 8 employees.
It means that in (1) we are not calculating a sample mean and sample variance, but a population mean and population variance. Conventionally greek symbols are used for those instead of latin symbols.
So we have $\mu = \frac{\sum x_i}{N}=179.5$ and $\sigma^2=\frac{\sum (x_i - \mu)^2}{N}=108.5$. I'm using $N=8$ here to distinguish the total number of people in the population from the $n$ people that we have in a sample.

In (2) and following we are repeatedly drawing samples of size 2 out of the complete population of size 8.
And we look at the sample means of those samples.
The question is then about what we can say about the probability distribution of the sample means.
As it is, the Central Limit Theorem says that it will tend to a normal distribution.

#### mathmari

##### Well-known member
MHB Site Helper
In (2) and following we are repeatedly drawing samples of size 2 out of the complete population of size 8.
And we look at the sample means of those samples.
The question is then about what we can say about the probability distribution of the sample means.
As it is, the Central Limit Theorem says that it will tend to a normal distribution.
But the Central Limit Theorem assumes that $n$ tends to infinity, or not? But we have $n=2$.

#### Klaas van Aarsen

##### MHB Seeker
Staff member
But the Central Limit Theorem assumes that $n$ tends to infinity, or not? But we have $n=2$.
We are creating a new type of super-sample by repeatedly drawing a sample.
The result is a sample of sample-means with an as yet unspecified size.

Note that we have 3 different $n$'s that we must distinguish.
• The population size $n_\text{population}=8$.
• The single shot sample size $n_x=2$ from which we calculate a sample-mean.
• The size of the sample of sample-means $n_{\bar x}$ that is as yet unspecified.
It is $n_{\bar x}$ that tends to infinity so that we can apply the Central Limit Theorem.

#### mathmari

##### Well-known member
MHB Site Helper
We are creating a new type of super-sample by repeatedly drawing a sample.
The result is a sample of sample-means with an as yet unspecified size.

Note that we have 3 different $n$'s that we must distinguish.
• The population size $n_\text{population}=8$.
• The single shot sample size $n_x=2$ from which we calculate a sample-mean.
• The size of the sample of sample-means $n_{\bar x}$ that is as yet unspecified.
It is $n_{\bar x}$ that tends to infinity so that we can apply the Central Limit Theorem.
Ahh I see!!

So from the Central Limit Theoremwe get the following:

From the population that has some distribution with mean value $\mu$ and variance $\sigma^2$, we choose random samples of size $n_{\bar x}$ and we calculate the mean, then for big $n_{\bar x}$ (theoretically $n_{\bar x}\rightarrow \infty$) the distribution of these means (sample-means) is approximately normal with mean value $\mu$ and variance $\frac{\sigma^2}{n_{\bar x}}$.

Is this correct?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
Ahh I see!!

So from the Central Limit Theoremwe get the following:

From the population that has some distribution with mean value $\mu$ and variance $\sigma^2$, we choose random samples of size $n_{\bar x}$ and we calculate the mean, then for big $n_{\bar x}$ (theoretically $n_{\bar x}\rightarrow \infty$) the distribution of these means (sample-means) is approximately normal with mean value $\mu$ and variance $\frac{\sigma^2}{n_{\bar x}}$.

Is this correct?
That doesn't look correct. Or at least not what is intended for the problem.
I'm afraid I've misinterpreted the question after all. Sorry for that.

Let's go back to square one and forget about the Central Limit Theorem for now.
The distribution of the employees looks like this:

This is a actually a histogram with each height that is in range as a separate bin.

We are drawing samples of size 2 with replacement.
So the possible samples are (employee 1, employee 1), (employee 1, employee 2), ..., (employee 8, employee 8).
Each sample has its own average height, sample variance, and sample standard deviation.
The distribution of the average height of those samples is then the list of all possible average heights combined with the frequency that they occur.

We might make a table as follows:
\begin{array}{|c|c|c|c|c|}\hline
\text{Empl1} & \text{Empl2} & \text{AvHeight} & \text{Var} & \text{Stdev} \\ \hline
1 & 1 & 169 \\
1 & 2 & 167 \\
\vdots & \vdots & \vdots \\
8 & 8 & 185 \\ \hline
\end{array}
If we make a histogram with each height that is in range as a separate bin, we get the distribution of sampling with replacement of average height of each employee for sample size 2. That is what the question asks for.
We should see that the resulting histogram looks a little more like a normal distribution than the original histogram.

#### mathmari

##### Well-known member
MHB Site Helper
We might make a table as follows:
\begin{array}{|c|c|c|c|c|}\hline
\text{Empl1} & \text{Empl2} & \text{AvHeight} & \text{Var} & \text{Stdev} \\ \hline
1 & 1 & 169 \\
1 & 2 & 167 \\
\vdots & \vdots & \vdots \\
8 & 8 & 185 \\ \hline
\end{array}
If we make a histogram with each height that is in range as a separate bin, we get the distribution of sampling with replacement of average height of each employee for sample size 2. That is what the question asks for.
We should see that the resulting histogram looks a little more like a normal distribution than the original histogram.
At the table we have Empl1 1 and Empl2 2. Do we consider also the case Empl1 2 and Empl2 1, or do we consider these cases as the same?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
At the table we have Empl1 1 and Empl2 2. Do we consider also the case Empl1 2 and Empl2 1, or do we consider these cases as the same?
These are different samples. If we want to list all samples we have to distinguish them.
We might compress the list though, but then we need to keep track that this combination occurs twice, and weigh the average height accordingly.

#### mathmari

##### Well-known member
MHB Site Helper
These are different samples. If we want to list all samples we have to distinguish them.
We might compress the list though, but then we need to keep track that this combination occurs twice, and weigh the average height accordingly.
If I did everything correct, we get the following frequencies for the average heights:

And the corresponding histogram is:

It looks more like a normal distribution than the original histogram.

So, is everything correct so far?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
If I did everything correct, we get the following frequencies for the average heights:

And the corresponding histogram is:

It looks more like a normal distribution than the original histogram.

So, is everything correct so far?
Looks good to me.

#### mathmari

##### Well-known member
MHB Site Helper
Looks good to me.
Great!!

For the question 3, do we calculate the mean value and the variance from the table of #9 ?

Or do we use the table of all possible samples:

And we add all the average heights and divide by the number of them?

I got stuck right now.

Last edited:

#### Klaas van Aarsen

##### MHB Seeker
Staff member
For the question 3, do we calculate the mean value and the variance from the table of #9 ?

Or do we use the table of all possible samples:

And we add all the average heights and divide by the number of them?

I got stuck right now.
We can do both.
However, we get the most accurate results from the table of all possible samples.

#### mathmari

##### Well-known member
MHB Site Helper
We can do both.
However, we get the most accurate results from the table of all possible samples.
Ok!! We have the following:

Therefore the mean value of the sampling average is equal to $$\frac{\text{Sum of Average Heights}}{\text{Sum of Frequencies}}=\frac{6462}{64}=100.97$$ right?

As for the variance it is $\displaystyle{s^2=\frac{\sum_{i=1}^{64}\left (x_i-100.97\right )^2}{64}}$ or is there also an other way to calculate this?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
Therefore the mean value of the sampling average is equal to $$\frac{\text{Sum of Average Heights}}{\text{Sum of Frequencies}}=\frac{6462}{64}=100.97$$ right?
Not quite.
It should be the same as the average height of the population, which is 179.5.
Did you perhaps forget to weigh the average heights with their frequencies? That would account for a factor of slightly less than 2 as is the case.

As for the variance it is $\displaystyle{s^2=\frac{\sum_{i=1}^{64}\left (x_i-100.97\right )^2}{64}}$ or is there also an other way to calculate this?
That is not the formula of the sample variance.

The correct formula is:
$$s^2=\frac{\sum_{i=1}^n (x_k-\bar x)^2}{n-1}$$
Note in particular the $n-1$ in the denominator.
This assumes we have to estimate the population mean from only this sample, which is $\bar x$.

Next, we need the sample variance of every sample, which uses the average height of every sample.
Suppose we have a sample with employee heights $(x_1, x_2)$.
Then the sample variance is:
$$s^2=\frac{\sum_{i=1}^n (x_k-\bar x)^2}{n-1} = \frac{(x_1-\bar x)^2 + (x_2-\bar x)^2}{2-1}$$
where $\bar x = \frac{x_1+x_2}{2}$, which is in the $\text{Av.Height}$ column that you already have.

When we have the variances of all the samples, we can calculate their mean by summing them and dividing by the total number. Oh, and don't forget to 'weigh' them by their frequency.

#### mathmari

##### Well-known member
MHB Site Helper
Not quite.
It should be the same as the average height of the population, which is 179.5.
Did you perhaps forget to weigh the average heights with their frequencies? That would account for a factor of slightly less than 2 as is the case.
Yes, that was the mistake! Thanks for the hint!

Next, we need the sample variance of every sample, which uses the average height of every sample.
Suppose we have a sample with employee heights $(x_1, x_2)$.
Then the sample variance is:
$$s^2=\frac{\sum_{i=1}^n (x_k-\bar x)^2}{n-1} = \frac{(x_1-\bar x)^2 + (x_2-\bar x)^2}{2-1}$$
where $\bar x = \frac{x_1+x_2}{2}$, which is in the $\text{Av.Height}$ column that you already have.

When we have the variances of all the samples, we can calculate their mean by summing them and dividing by the total number. Oh, and don't forget to 'weigh' them by their frequency.

So we have the following:

From that we get that the mean value of the sampling average is equal to $$\frac{\text{Sum of Average Heights * Frequency}}{\text{Sum of Frequencies}}=\frac{11488}{64}=179.5$$

And the variance of the sampling average is equal to $$\frac{\text{Sum of Variance * Frequency}}{\text{Sum of Frequencies}}=\frac{6944}{64}=108.5$$ which is again the same as variance of population. So these two values have to be always the same, right?

At question 4, the mean value of the sampling variances is asked. Do we have to find the mean value of the $6$th column of the above table?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
From that we get that the mean value of the sampling average is equal to $$\frac{\text{Sum of Average Heights * Frequency}}{\text{Sum of Frequencies}}=\frac{11488}{64}=179.5$$

And the variance of the sampling average is equal to $$\frac{\text{Sum of Variance * Frequency}}{\text{Sum of Frequencies}}=\frac{6944}{64}=108.5$$ which is again the same as variance of population. So these two values have to be always the same, right?

At question 4, the mean value of the sampling variances is asked. Do we have to find the mean value of the $6$th column of the above table?
You just did that. So you have just answered question 4.

Rereading question 3, I see that your original formula was more or less correct after all. It is the variance of the entire population (of sample-means).
That is, it should be:
$$\sigma_{\bar x}^2 = \frac{\sum_{i=1}^{64} f_i\cdot(\bar x_i - \mu_{\bar x})^2}{\sum_{i=1}^{64} f_i}$$
where $\mu_{\bar x}=179.5$, which is the mean value of the sampling average that you just calculated, where $\bar x_i$ are the sampling averages, and where $f_i$ are the frequencies of those sampling averages.
Note the use of Greek letters since we are talking about the entire population of possible samples.
Can we predict what it will be? That is, how it is related to the variance of the original population?

#### mathmari

##### Well-known member
MHB Site Helper
You just did that. So you have just answered question 4.
I got stuck right now. Wasn't this the answer of question 3, i.e. the variance of the sampling average?

Rereading question 3, I see that your original formula was more or less correct after all. It is the variance of the entire population (of sample-means).
That is, it should be:
$$\sigma_{\bar x}^2 = \frac{\sum_{i=1}^{64} f_i\cdot(\bar x_i - \mu_{\bar x})^2}{\sum_{i=1}^{64} f_i}$$
where $\mu_{\bar x}=179.5$, which is the mean value of the sampling average that you just calculated, where $\bar x_i$ are the sampling averages, and where $f_i$ are the frequencies of those sampling averages.
Note the use of Greek letters since we are talking about the entire population of possible samples.
Can we predict what it will be? That is, how it is related to the variance of the original population?
It will be the same, or not? But why?

#### Klaas van Aarsen

##### MHB Seeker
Staff member
I got stuck right now. Wasn't this the answer of question 3, i.e. the variance of the sampling average?
Question 3 asks for the variance of the sampling averages.
Question 4 asks for the average of the sampling variances.

For question 3 we have:
$$\mu_{\bar x} = \text{Mean value of sampling averages} = \frac{\sum f_i \cdot \bar x_i}{\sum f_i}$$
and:
$$\sigma_{\bar x}^2 = \text{Variance of sampling averages} = \frac{\sum f_i \cdot (\bar x_i - \mu_{\bar x})^2}{\sum f_i}$$

For question 4 we have:
$$s_i^2 = \text{Sampling variance of sample }i = \frac{\sum_j (x_{i,j} - {\bar x_i})^2}{n_x-1}$$
and:
$$\mu_{s^2} = \text{Mean value of sampling variances} = \text{Average of sampling variances} = \frac{\sum f_i \cdot s_i^2}{\sum f_i}$$

See the difference?

It will be the same, or not? But why?
I don't think that they will be the same. Perhaps calculate it and see?

#### mathmari

##### Well-known member
MHB Site Helper
Question 3 asks for the variance of the sampling averages.
Question 4 asks for the average of the sampling variances.

For question 3 we have:
$$\mu_{\bar x} = \text{Mean value of sampling averages} = \frac{\sum f_i \cdot \bar x_i}{\sum f_i}$$
and:
$$\sigma_{\bar x}^2 = \text{Variance of sampling averages} = \frac{\sum f_i \cdot (\bar x_i - \mu_{\bar x})^2}{\sum f_i}$$

For question 4 we have:
$$s_i^2 = \text{Sampling variance of sample }i = \frac{\sum_j (x_{i,j} - {\bar x_i})^2}{n_x-1}$$
and:
$$\mu_{s^2} = \text{Mean value of sampling variances} = \text{Average of sampling variances} = \frac{\sum f_i \cdot s_i^2}{\sum f_i}$$

See the difference?
We have this table.

From this we get the following values:

For question 3 :
$$\mu_{\bar x} = \text{Mean value of sampling averages} = \frac{11488}{64}=179.5$$
and:
$$\sigma_{\bar x}^2 = \text{Variance of sampling averages} = \frac{3472}{64}=54.25$$

For question 4 :
$$\mu_{s^2} = \text{Mean value of sampling variances} = \text{Average of sampling variances} = \frac{6944}{64}=108.5$$

For question 5 :
$$\mu_{s} = \text{Mean value of sampling standard deviations} = \text{Average of sampling standard deviations} = \frac{526.09}{64}=8.22$$

Is everything correct?

I don't think that they will be the same. Perhaps calculate it and see?
The variance of the population is $108.5$ which is equal to the mean value of sampling variances. The variance of sampling averages is different.

#### Klaas van Aarsen

##### MHB Seeker
Staff member
We have this table.

From this we get the following values:

For question 3 :
$$\mu_{\bar x} = \text{Mean value of sampling averages} = \frac{11488}{64}=179.5$$
and:
$$\sigma_{\bar x}^2 = \text{Variance of sampling averages} = \frac{3472}{64}=54.25$$

For question 4 :
$$\mu_{s^2} = \text{Mean value of sampling variances} = \text{Average of sampling variances} = \frac{6944}{64}=108.5$$

For question 5 :
$$\mu_{s} = \text{Mean value of sampling standard deviations} = \text{Average of sampling standard deviations} = \frac{526.09}{64}=8.22$$

Is everything correct?
Yep. All correct.

The variance of the population is $108.5$ which is equal to the mean value of sampling variances. The variance of sampling averages is different.
The variance of sampling averages is $54.25$.
Isn't that half of $108.5$? And didn't we have $n_x=2$?

We have:
$$\sigma_{\bar x}^2 = \frac{\sigma^2}{n_x}$$
and:
$$\sigma_{\bar x} = SE = \text{Standard Error} = \frac{\sigma}{\sqrt{n_x}}$$
We see an $SE$ in most statistical tests.
The $SE$ we have just found, is the one that applies to the z-test.
That is, if we want to know if 2 groups have a different population mean, we typically test if their sampling averages differ by at least $1.96 \cdot SE$. If they do, the groups are significantly different.
(The factor $1.96$ is the critical z-value for a 2-sided z-test with significance level $\alpha=5\%$.)

#### mathmari

##### Well-known member
MHB Site Helper
Yep. All correct.
The mean value of sampling variances is the same as the variance of population. But the mean value of sampling standard deviations is not equal to the standard deviation of the population.

Why is it like that? Or do both have to be equal and I have done something wrong?

The variance of sampling averages is $54.25$.
Isn't that half of $108.5$? And didn't we have $n_x=2$?

We have:
$$\sigma_{\bar x}^2 = \frac{\sigma^2}{n_x}$$
and:
$$\sigma_{\bar x} = SE = \text{Standard Error} = \frac{\sigma}{\sqrt{n_x}}$$
We see an $SE$ in most statistical tests.
The $SE$ we have just found, is the one that applies to the z-test.
That is, if we want to know if 2 groups have a different population mean, we typically test if their sampling averages differ by at least $1.96 \cdot SE$. If they do, the groups are significantly different.
(The factor $1.96$ is the critical z-value for a 2-sided z-test with significance level $\alpha=5\%$.)
I see!!

#### Klaas van Aarsen

##### MHB Seeker
Staff member
The mean value of sampling variances is the same as the variance of population.
Basically that is why we have $n-1$ in the denominator for the sampling variance.
It is a correction so that we get the correct variance.

But the mean value of sampling standard deviations is not equal to the standard deviation of the population.

Why is it like that? Or do both have to be equal and I have done something wrong?
When adding independent variables, the result has a variance that is the sum of the variances.
And if we divide an independent variable by a constant, the result has a variance that is divided by that constant.
So we can expect that variances behave predictably and consistently. "They add up" so to speak.

The standard deviation is the square root of the variance.
Consequently summing independent variables, or taking their mean, gives results that appear to be unpredictable.
That is, we can expect such standard deviations to be different from the population.

#### mathmari

##### Well-known member
MHB Site Helper
Basically that is why we have $n-1$ in the denominator for the sampling variance.
It is a correction so that we get the correct variance.

When adding independent variables, the result has a variance that is the sum of the variances.
And if we divide an independent variable by a constant, the result has a variance that is divided by that constant.
So we can expect that variances behave predictably and consistently. "They add up" so to speak.

The standard deviation is the square root of the variance.
Consequently summing independent variables, or taking their mean, gives results that appear to be unpredictable.
That is, we can expect such standard deviations to be different from the population.
Ahh ok!!

There is also an other question:

If we have a sample of size $30$ (with replacement), which is the probability that the sample average is over $181$ cm?

I have done the following:

We have that the sample mean value is equal to the population mean $\mu_X=\mu=179.5$.

The sample standard deviation is equal to $\sigma_X=\frac{\sigma}{\sqrt{n}}=\frac{10.42}{\sqrt{30}}=1.90$.

The probability that the sample mean height is over $181$ is equal to \begin{align*}P(Χ > 181) &=1-P(X\leq 181)=1-P\left (Z\leq \frac{181-\mu_X}{\sigma_X}\right )=1-P\left (Z\leq \frac{181-179.5}{1.90}\right ) \\ & =1-P\left (Z\leq \frac{1.5}{1.90}\right )=1-P\left (Z\leq 0.79\right )=1-0.78524 \\ & =0.21476\end{align*} So the possibility that the sample mean height is over $181$ is equal to $21,476\%$.

Is everything correct?

Last edited:

#### Klaas van Aarsen

##### MHB Seeker
Staff member
There is also an other question:

If we have a sample of size $30$ (with replacement), which is the probability that the sample average is over $181$ cm?

I have done the following:

We have that the sample mean value is equal to the population mean $\mu_X=\mu=179.5$.
Let me nitpick a bit...

The sample mean value is the average height of the sample of size $30$ yes?
Isn't it unlikely that it will be exactly the same as the population mean?

The sample standard deviation is equal to $\sigma_X=\frac{\sigma}{\sqrt{n}}=\frac{10.42}{\sqrt{30}}=1.90$.
Isn't the sample standard deviation equal to $s = \sqrt{\frac{\sum (x_i - \bar x)^2}{n-1}}$?
Did you perhaps mean the standard deviation of the sample-means? Also known as the Standard Error (SE)?

The probability that the sample mean height is over $181$ is equal to \begin{align*}P(Χ > 181) &=1-P(X\leq 181)=1-P\left (Z\leq \frac{181-\mu_X}{\sigma_X}\right )=1-P\left (Z\leq \frac{181-179.5}{1.90}\right ) \\ & =1-P\left (Z\leq \frac{1.5}{1.90}\right )=1-P\left (Z\leq 0.79\right )=1-0.78524 \\ & =0.21476\end{align*} So the possibility that the sample mean height is over $181$ is equal to $21,476\%$.
You wrote 'possibility'. Did you perhaps mean 'probability'?
Otherwise this looks correct.

#### mathmari

##### Well-known member
MHB Site Helper
You wrote 'possibility'. Did you perhaps mean 'probability'?
Oh yes, I meant probability.

The sample mean value is the average height of the sample of size $30$ yes?
Isn't it unlikely that it will be exactly the same as the population mean?

Isn't the sample standard deviation equal to $s = \sqrt{\frac{\sum (x_i - \bar x)^2}{n-1}}$?
Did you perhaps mean the standard deviation of the sample-means? Also known as the Standard Error (SE)?
Otherwise this looks correct.
So, did I used at the calculation of the probability the correct values for mean value and standard deviation just the wrong names?