Here's why you should study causality: because once you've done so, you can begin to ask and answer better questions. For example, instead of merely noting that a hospital's appointments are down at the same time some virus is spreading around, you can ask the better question: is the virus causing appointment counts to go down? The new causality tools give you what you need to answer that question! It is still an inductive procedure, so it's not as though you go from induction to deduction. However, you're asking and answering the questions people really want to know: the "why" questions.

Here's how to learn the new causality. Prerequisites: probability and statistics, the more the better. If you've have a typical calculus-based version, you'd certainly be well-prepared. However, the first book on the list only requires basic probability and statistics. If you want to be able to do all the computations yourself, you would need more background to get through Books 2 and especially 3.

Study these three books, in this order.

*The Book of Why*, by Judea Pearl and Dana Mackenzie.*Causal Inference in Statistics: A Primer*, by Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell.*Causality: Models, Reasoning, and Inference*, by Judea Pearl.

Teaser: contrary to the standard doctrine of traditional statistics, which I had learned, you do not always need to have a randomized controlled trial in order to establish causality! With the right data, even an observational study can give you causality (this is how we know that smoking causes lung cancer, e.g., when the right RCT would be unethical).

Another teaser: Have you ever wondered how you can tell when to control for a possibly confounding variable or not? The new causality not only makes the whole concept of confounding much clearer, but tells you when you need to condition on a variable, and when NOT to condition on a variable! (Hint: sometimes conditioning on a variable gives you the WRONG answer!)

Highly recommended! ]]>

Find the expected salary of workers at skill level $Z=z$ had they received $x$ years of college education. [Hint: Use Theorem 4.3.2, with $e:Z=z,$ and the fact that for any two Gaussian variables, say $X$ and $Z,$ we have $E[X|Z=z]=E[Z]+R_{XZ}(z-E[Z]).$ Use the material in Sections 3.8.2 and 3.8.3 to express all regression coefficients in terms of structural parameters, and show that $$E[Y_x|Z=z]=abx+\frac{bz}{1+a^2}.]$$

bg4ml.png

Here, $X$ is education, $Z$ is skill, and $Y$ is salary. The accompanying SEM is

\begin{align*}

X&=U_1\\

Z&=aX+U_2\\

Y&=bZ.

\end{align*}

We are called on to compute $E[Y_x|Z=z].$

Now Theorem 4.3.2 states: Let $\tau$ be the slope of the total effect of $X$ on $Y,$

$$\tau=E[Y|\doop(x+1)]-E[Y|\doop(x)] $$

then, for any evidence $Z=e,$ we have

$$E[Y_{X=x}|Z=e]=E[Y|Z=e]+\tau(x-E[X|Z=e]).$$

For our problem, with $e:Z=z,$ we have

$$E[Y_{X=x}|Z=z]=E[Y|Z=z]+\tau(x-E[X|Z=z]).$$

Not sure where to go from there.

Now I know that this is a non-deterministic counterfactual problem, which means the process should be:

1.

2.

3.

So, for abduction, am I right in thinking that the only evidence we're using right now is $Z?$ In that case, we want to determine the $U_1$ and $U_2$ that correspond to $Z=z.$ We have the two equations

\begin{align*}

X&=U_1\\

z&=aX+U_2,

\end{align*}

or

\begin{align*}

X&=U_1\\

z-aX&=U_2.

\end{align*}

Without knowing the pre-condition value of $X,$ it's not clear how to continue. How do I continue? I'm also really not understanding the hint. Any thoughts about the hint?

Thanks for your time!

Note: I have cross-posted this at Cross-Validated:

https://stats.stackexchange.com/ques...on-calculation