Welcome to our community

Be a part of something great, join today!

Predicting Z-Score

dlee

New member
Apr 13, 2013
4
Consider two random variables X,Y whose correlation is ρ = 0.7 (and the joint PMF is football shaped). Predict the z-score for Y if you observe that X is at the 30th percentile (assuming X ~ N(4,4)).

The solution to this problem is -0.364, but I'm not sure how to approach this answer.
 

Klaas van Aarsen

MHB Seeker
Staff member
Mar 5, 2012
8,774
Re: Correlation?

Consider two random variables X,Y whose correlation is ρ = 0.7 (and the joint PMF is football shaped). Predict the z-score for Y if you observe that X is at the 30th percentile (assuming X ~ N(4,4)).

The solution to this problem is -0.364, but I'm not sure how to approach this answer.
I we assume a bivariate normal distribution, we "expect" the relation:
$$y(x) = \text{sgn}(\rho) \frac {\sigma_Y}{\sigma_X} (x - \mu_X) + \mu_Y$$

With X at the 30th percentile, that means $z_X = \frac{x - \mu_X}{\sigma_X} = \text{invNorm}(0.30) = -0.524$.

In other words, the z-score for Y is
$$z_Y = \frac{y - \mu_Y}{\sigma_Y} = \text{sgn}(\rho) z_X = -0.524$$

I don't know how they got to -0.364.
 

zzephod

Well-known member
Feb 3, 2013
134
Re: Correlation?

I we assume a bivariate normal distribution, we "expect" the relation:
$$y(x) = \text{sgn}(\rho) \frac {\sigma_Y}{\sigma_X} (x - \mu_X) + \mu_Y$$

With X at the 30th percentile, that means $z_X = \frac{x - \mu_X}{\sigma_X} = \text{invNorm}(0.30) = -0.524$.

In other words, the z-score for Y is
$$z_Y = \frac{y - \mu_Y}{\sigma_Y} = \text{sgn}(\rho) z_X = -0.524$$

I don't know how they got to -0.364.
That can't be right.

You can without loss of generality assume \(\displaystyle \mu_X = \mu_Y = 0\), so we have a model:

$$y=\alpha x$$

then $\displaystyle \sigma_Y=\alpha\; \sigma_X$, and $\rho=E(XY)/(\sigma_X \sigma_Y)=\alpha\; \sigma_X/\sigma_Y$

Hence: $$\alpha=\rho \frac{\sigma_Y}{\sigma_X}$$...

.
 

Klaas van Aarsen

MHB Seeker
Staff member
Mar 5, 2012
8,774
Re: Correlation?

That can't be right.

You can without loss of generality assume \(\displaystyle \mu_X = \mu_Y = 0\)
I didn't.
The problem asks for a z-score, meaning $\mu_X$, and $\mu_Y$ get eliminated (see my derivation).

so we have a model:

$$y=\alpha x$$

then $\displaystyle \sigma_Y=\alpha\; \sigma_X$, and $\rho=E(XY)/(\sigma_X \sigma_Y)=\alpha\; \sigma_X/\sigma_Y$

Hence: $$\alpha=\rho \frac{\sigma_Y}{\sigma_X}$$...
Well... multiplying by 0.7 almost gives the requested result.
But that won't be right.
 

zzephod

Well-known member
Feb 3, 2013
134
Re: Correlation?

... Well... multiplying by 0.7 almost gives the requested result.
But that won't be right.
It will be if you use "nearest value" in inverse normal lookup in a table.

.
 
Last edited:

zzephod

Well-known member
Feb 3, 2013
134
Re: Correlation?

I didn't.
The problem asks for a z-score, meaning $\mu_X$, and $\mu_Y$ get eliminated (see my derivation).
Well, since you failed to set up a model with the correct correlation it is not irrelevant to make an observation that simplifies setting the correlation without changing the answer.

.
 
Last edited:

Klaas van Aarsen

MHB Seeker
Staff member
Mar 5, 2012
8,774
Re: Correlation?

Well, since you failed to set up a model with the correct correlation it is not irrelevant to make an observation that simplifies setting the correlation without changing the answer.

.
The model is a positive sloped football that could be anywhere.
The problem puts the heart at x=4 with a variance of 4.
The y coordinate of the heart and the slope can still be freely chosen.
Then, with the given correlation, the "width" of the football becomes fixed.

Either way, when talking about the z-score of y, all these choices become moot, since they are standardized.
The relationship between $E(z_Y|z_X)$ and $z_X$ is simply $E(z_Y|z_X) = z_X$, whichever model you pick.
This is a "standardized" football that is aligned on the line y=x with a width such that the correlation is satisfied.
 
Last edited:

zzephod

Well-known member
Feb 3, 2013
134
Re: Correlation?

The model is a positive sloped football that could be anywhere.
The problem puts the heart at x=4 with a variance of 4.
The y coordinate of the heart and the slope or can still be freely chosen.
Then, with the given correlation the "width" of the football becomes fixed.

Either way, when talking about the z-score of y, all these choices become moot, since they are standardized.
The relationship between $E(z_Y|z_X)$ and $z_X$ is simply $E(z_Y|z_X) = z_X$, whichever model you pick.
This is a "standardized" football that is aligned on the line y=x with a width such that the correlation is satisfied.
Since for Bivariate normal rv $X,\ Y$:

$$E(Y|X)=\rho\; \frac{\sigma_Y}{\sigma_X}Y$$

So as $z_X,\ z_Y$ have the same correlation coefficient as $X$ and $Y$ we have:

$$E(z_Y|z_X) = \rho\; z_X$$

See: http://athenasc.com/Bivariate-Normal.pdf.

... And simulation confirms this.

.
 
Last edited: