Multiplication of conditional probability with several variables

Ronald_Ku · Aug 6, 2014

Dear All,

I am a starter to machine learning and i am currently confused about the following problem:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z). But I don't think it is correct since
P(X|Z)= P(X|Y,Z)P(Y|Z)
But clearly P(X|Y)=/= P(X|Y,Z)

Assuming all Events are not independent.

I have simplified the problem in the above equation. The true equation is
p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

Any helps and ideas will be very appreciated.

Stephen Tashi · Aug 6, 2014

Ronald_Ku said:

since
P(X|Z)= P(X|Y,Z)P(Y|Z)

Are you saying the above is given as a special condition in the problem?

Or did you mean [itex] P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) [/itex] ?

Ronald_Ku · Aug 6, 2014

Stephen Tashi said:

Are you saying the above is given as a special condition in the problem?

Or did you mean [itex] P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) [/itex] ?

yes you are correct.
what I mean is P(x,y|z)=P(x|y,z)P(y|z)

Stephen Tashi · Aug 6, 2014

Ronald_Ku said:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z)

I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.

Ronald_Ku · Aug 7, 2014

It is in the introduction chapter of the book and is talking about polynomial curve fitting.
X,T refer to a training set while t refers to the predicted point at position x
W refers to the set of parameters of M-order polynomial, that
y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

it claims the following equation for the prediction of t with help of the training set and position x
p(t|x, X, T) =[itex]\int [/itex] p(t|x,w)p(w|X, T) dw

that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
But I believe that p(t|x,w)=/= p(t|x,w,X, T)

If it is not clear enough, i can explain more

Stephen Tashi · Aug 7, 2014

Ronald_Ku said:

p(t|x, X, T) =[itex]\int [/itex] p(t|x,w)p(w|X, T) dw

To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
p( X,k)
to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.

Ronald_Ku · Aug 7, 2014

Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

I am not sure when it comes to my case.
In my case, the notation p(t|x, X, T)means
given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.

Stephen Tashi · Aug 7, 2014

The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

[itex] p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw [/itex]

is to consider [itex] X,T [/itex] to be ordinary variables, not random variables. So within the equation [itex] X,T [/itex] can be treated as if they have some constant value.

The random variable [itex] t [/itex] is a function only of the random variables [itex] x [/itex] and [itex] w [/itex]
(i.e [itex] t = w_0 + w_1x + ... w_n x^n [/itex]). So the notation [itex] p(t|x,w) [/itex] means the same thing as [itex] p(t|x,w,X,T) [/itex] because [itex] t [/itex] has no random variation due to [itex] X, T [/itex].

But by that interpretation, the author could have written [itex] p(w | X,T) [/itex] as [itex] p(w) [/itex]. I supposed he needed to mention [itex] X, T [/itex] somewhere on the right hand side.

Leaving [itex] X,T [/itex] unmentioned, it isn't controversial that

[itex] p(t|x) = \int p(t|x,w) p(w) dw [/itex]

or, mentioning them everywhere, that

[itex] p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw [/itex]

Ronald_Ku · Aug 8, 2014

Stephen Tashi said:

The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

[itex] p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw [/itex]

is to consider [itex] X,T [/itex] to be ordinary variables, not random variables. So within the equation [itex] X,T [/itex] can be treated as if they have some constant value.

The random variable [itex] t [/itex] is a function only of the random variables [itex] x [/itex] and [itex] w [/itex]
(i.e [itex] t = w_0 + w_1x + ... w_n x^n [/itex]). So the notation [itex] p(t|x,w) [/itex] means the same thing as [itex] p(t|x,w,X,T) [/itex] because [itex] t [/itex] has no random variation due to [itex] X, T [/itex].

But by that interpretation, the author could have written [itex] p(w | X,T) [/itex] as [itex] p(w) [/itex]. I supposed he needed to mention [itex] X, T [/itex] somewhere on the right hand side.

Leaving [itex] X,T [/itex] unmentioned, it isn't controversial that

[itex] p(t|x) = \int p(t|x,w) p(w) dw [/itex]

or, mentioning them everywhere, that

[itex] p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw [/itex]

Thanks so much.I may try to proceed in this direction and see if anything weird occur again.

Ronald_Ku · Aug 8, 2014

I have another question.
if the above equations are needed to be considered with the following equation.
p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
α, β are fixed.

The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

Stephen Tashi · Aug 8, 2014

Ronald_Ku said:

So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)

Multiplication of conditional probability with several variables

Related to Multiplication of conditional probability with several variables

1. What is the formula for multiplying conditional probabilities with several variables?

2. How is the formula for multiplying conditional probabilities with several variables derived?

3. Can the formula for multiplying conditional probabilities with several variables be extended to more than three variables?

4. What is the difference between conditional probability and joint probability?

5. How is the multiplication of conditional probabilities with several variables used in real-life applications?

Similar threads

Hot Threads

Recent Insights