# [SOLVED]Expected values and variances

#### mathmari

##### Well-known member
MHB Site Helper
Hey!! The variable $Y$ denotes the amount of money that an adult person gives out for Christmas presents.
The distribution of $Y$ depends on whether the person is employed ($E = 1$) or not ($E = 0$).
It holds that $P (E = 1) = p$, i.e a randomly selected person is employed with probability $p$.

We have the following
\begin{align*}&E(Y\mid E=1)=\mu_1 \\ &V(Y\mid E=1)=\sigma_1^2 \\ &E(Y\mid E=0)=\mu_0 \\ &V(Y\mid E=0)=\sigma_0^2 \\ &E(Y)=\mu=p\mu_1+(1-p)\mu_0 \\ &V(Y)=\sigma^2=p\sigma_1^2+(1-p)\sigma_0^2+G\end{align*}
where \begin{equation*}G=p(\mu_1-\mu)^2+(1-p)(\mu_0-\mu)^2\geq 0\end{equation*}

A research institute would like to estimate $\mu$ based on a $n$-sized sample. The parameter $p$ is known to the institute. Two employees of the institute, A and B, discuss the procedure.

• A suggests questioning $n$ randomly selected people and using their average spend as an estimate for $\mu$.
• B proposes to separately survey $n p$ employed persons and $n (1 - p)$ unemployed persons, and then use the estimator \begin{equation*} \overline{Y}_B = p \overline{Y}_1 + (1-p) \overline{Y}_0 \end{equation*} $\overline {Y}_1$ and $\overline{Y}_0$ are the average spend of the employed and non-employed persons, respectively. For the sake of simplicity, we assume that $n p$ and $n (1 - p)$ are integers.

If I understand correcly the proposition of B, we have a sample of soze $n$ with $np$ employed and $n(1-p)$ unemployed. $Y_{1i}$ is the answer that the employed perosn $i$ gives and $Y_{0i}$ is the answer that the unemployed person $i$ gives. We calculate the mean of what the employed people spend, according to the survey, and we define that average $\overline{Y}_1$, i.e. $\overline{Y}_1=\frac{1}{np}\sum_{i=1}^{np}Y_{1i}$. Respectively, it holds that $\overline{Y}_0=\frac{1}{n(1-p)}\sum_{i=1}^{n(1-p)}Y_{0i}$.
Adding these two results multiplied by the respective possibility we get the average of all people.

Have I understood that correctly? I want to calculate the expected values and the variances of the estimates of A and B.

How could we do that? Could you give me a hint? #### Klaas van Aarsen

##### MHB Seeker
Staff member
Hey mathmari !! For A we have $E(\overline {Y_A}) = E(Y)$ and $\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n}$ don't we?
And for B we have $E(\overline {Y_B}) = E(Y)$ and $\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}$ don't we? #### mathmari

##### Well-known member
MHB Site Helper
For A we have $E(\overline {Y_A}) = E(Y)$ and $\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n}$ don't we?
And for B we have $E(\overline {Y_B}) = E(Y)$ and $\sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)}$ don't we? Does it hold that $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ because $\overline {Y_A}$ and $\overline {Y_B}$ describes respectlively the average amount of money?

Why do we not use here that $\overline{Y}_B=p \overline{Y_1} + (1-p) \overline{Y_0}$ ?

From $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ we get that both estimators are unbiased, right? To check which estimate is better we have to compare the two variances, right? The variance of the estimate A is equal to the variance of the median of the amounts of money. Does this mean that this is better than the variance of the estaimate B?
Or can we not compare them? #### Klaas van Aarsen

##### MHB Seeker
Staff member
Does it hold that $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ because $\overline {Y_A}$ and $\overline {Y_B}$ describes respectlively the average amount of money?

Why do we not use here that $\overline{Y}_B=p \overline{Y_1} + (1-p) \overline{Y_0}$ ?
If follows mathematically.
Let's go through the steps for B, using indeed the formula for $\overline{Y_B}$.
$$E(\overline{Y_B})=E\left(p\overline{Y_1}+(1−p)\overline{Y_0}\right) =pE(Y_1)+(1−p)E(Y_0) =p\mu_1 + (1-p)\mu_0 =\mu = E(Y)$$
Yes? From $E(\overline {Y_A}) = E(Y)$ and $E(\overline {Y_B}) = E(Y)$ we get that both estimators are unbiased, right?
Yep.

To check which estimate is better we have to compare the two variances, right? The variance of the estimate A is equal to the variance of the median of the amounts of money. Does this mean that this is better than the variance of the estaimate B?
Or can we not compare them? Let's compare them.

$$\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n} = \frac{p\sigma_1^2+(1-p)\sigma_0^2+G}{n} \\ \sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)} =\frac{p\sigma_1^2 + (1-p)\sigma_0^2}{n}$$
So $\sigma^2(\overline {Y_B})$ is smaller than $\sigma^2(\overline {Y_A})$ isn't it? And if the standard deviations $\sigma_1$ and $\sigma_0$ are comparable or smaller than $|\mu_1 - \mu_0|$, then the standard deviation of B will be much smaller than the one of A.
That's assuming that both $p$ and $1-p$ are significantly greater than 0.

#### mathmari

##### Well-known member
MHB Site Helper
If follows mathematically.
Let's go through the steps for B, using indeed the formula for $\overline{Y_B}$.
$$E(\overline{Y_B})=E\left(p\overline{Y_1}+(1−p)\overline{Y_0}\right) =pE(Y_1)+(1−p)E(Y_0) =p\mu_1 + (1-p)\mu_0 =\mu = E(Y)$$
Ah ok!!

Let's compare them.

$$\sigma^2(\overline {Y_A}) = \frac{\sigma^2(Y)}{n} = \frac{p\sigma_1^2+(1-p)\sigma_0^2+G}{n} \\ \sigma^2(\overline {Y_B}) = \sigma^2(p \overline{Y_1} + (1-p) \overline{Y_0}) = p^2\frac{\sigma_1^2}{np} + (1-p)^2\frac{\sigma_0^2}{n(1-p)} =\frac{p\sigma_1^2 + (1-p)\sigma_0^2}{n}$$
So $\sigma^2(\overline {Y_B})$ is smaller than $\sigma^2(\overline {Y_A})$ isn't it? And if the standard deviations $\sigma_1$ and $\sigma_0$ are comparable or smaller than $|\mu_1 - \mu_0|$, then the standard deviation of B will be much smaller than the one of A.
That's assuming that both $p$ and $1-p$ are significantly greater than 0.

So, since the variance of A is bigger than that of B, it becomes clear that according to A's estimates, there will be larger fluctuations as eith the estimation of B. Thus, the estimate of B is better, isn't it? #### Klaas van Aarsen

##### MHB Seeker
Staff member
Ah ok!!

So, since the variance of A is bigger than that of B, it becomes clear that according to A's estimates, there will be larger fluctuations as eith the estimation of B. Thus, the estimate of B is better, isn't it?
Yep. #### mathmari

##### Well-known member
MHB Site Helper
Yep. Ok! Thank you very much!! 