Multiplication of marginal densities

In summary, the mutual information between two random variables is a measure of how much the two distributions tell us about each other. It increases as the two distributions become more dependent on each other.
  • #1
architect
31
0
Hi,

I am trying to find out what would be the significance of the result of multiplication of two marginal densities that originate from the integration of the a joint density function that connects them. To be more specific let's say we have a joint density function of two random variables. One integrates the joint density with respect to one variable in order to obtain the marginal distribution of the other. If one does this for both variables, two marginal densities are obtained. What would the product of these two marginal densities signify?

Although the product is a joint density function itself it assumes independence, which I am not sure it represents the original joint density function.

Thanks for your time and help.

BR,

Alex
 
Physics news on Phys.org
  • #2
The product of the density functions of random variables X and Y is the joint density of (X,Y) when the random variables are sampled independently. Knowing the two marginal densities of an unknown joint density function f(X,Y) of two non-independent random variables is not sufficient information to reconstruct the joint density.
 
  • #3
The product of marginals is the joint under an independence assumption. Comparisons between the joint and the product-of-marginals distributions can be useful in identifying "just how" dependent two random variables are. For instance, in information theory, the "mutual information" of two distributions (a measure of how much each tells us about the other) is attained by taking the relative entropy of the actual joint distribution, with respect to the product-of-marginals distribution.

In real terms, it means nothing. Only when compared to the actual joint can it give useful information.
 
  • #4
Firstly, I would like to thank you for your replies.

In other words (and as alexfloo pointed out) the product of two marginal densities that have been obtained from a known joint density function by means of integration tell us nothing about the original density itself.

However, when compared in terms of the relative entropy the mutual information of the two distributions (original joint and the distribution resulting from the product of the two marginals) might be obtained.

My aim here was just that - to find out how much the joint distribution of the two variables deviates from the distribution resulting from the product of the two marginals-, which I now hope that I can estimate by applying the relative entropy definition.

I hope I got this right.

Thanks once more,

Alex.
 
  • #5
Do you have a formula for the densities, or are you working with experimental data? In the first case, I'd first see if you can factor the joint. If it can be written as the product of a function only of x times a function only of y, then those two functions are necessarily the marginals, (up to a scaling constant) and the random variables are independent.

If you're working with experimental data, check the covariance first. If that's inconclusive, then try mutual information.

What exactly are you working with?
 
  • #6
No unfortunately we cannot factor the joint distribution into two marginal densities since the two random variables are not independent. Also, I am not working with experimental data. To be more precise the joint density function that I am dealing with is the one obtained after the transformation of a bi-variate Gaussian density function into polar co-ordinates.

I think that the mutual information is the way forward, which I will shortly attempt to estimate.



Alex
 
  • #7
Another approach is to look at the cumulative distributions and find the Kolmogorov-Smirnov distance between F(x,y) and F(x,inf)F(inf,y), which is just the maximum absolute value of the difference.
 
  • #8
architect said:
the joint density function that I am dealing with is the one obtained after the transformation of a bi-variate Gaussian density function into polar co-ordinates.

I can't resist asking why you choose to translate the data to polar coordinates. From a point of view of representing the data as two independent random variables, it would seem more natural to pick a transformation to Cartesian coordinates that makes the X and Y coordinates independent random variables.
 
  • #9


Further, to this question I would like to ask for some more details please regarding the interpretation of a mutual information graph. As mentioned in all replies: "the product of marginals is the joint under an independence assumption. Comparisons between the joint and the product-of-marginals distributions can be useful in identifying "just how" dependent two random variables are. For instance, in information theory, the "mutual information" of two distributions (a measure of how much each tells us about the other) is attained by taking the relative entropy of the actual joint distribution, with respect to the product-of-marginals distribution."

I have now computed the mutual information (MI) by using the appropriate formula. I have varied one of the parameters of the distribution and performed the integration numerously. Therefore, I have computed the MI for various values of that parameter (say p) so that I can later plot MI,p and observe the result. Please find attached the graph I obtained. It looks-like a skewed distribution and the question now is how does one interpret this graph? What can we say about it? How much information does the joint in comparison to the product of marginals share? We see that for large values of p the MI reduces significantly. Perhaps, this indicates that for large values of p a good approximation is achieved by the product of marginals. Any other hints? Also, the MI goes up to 0.035 (for smaller values of p) which is a small value but relative to what? For example, this maximum value (0.035) may be relatively small fro practical purposes and therefore the product of marginals may give an acceptable error.

Thanks once more for your time in reading this. Please see attachment mi.jpg. The logarithm base was set to 10.

Bests,

Alex
 

Attachments

  • mi.jpg
    mi.jpg
    28.7 KB · Views: 410
  • #10
There is always a difficulty in real life situations defining what is meant by a "small" or "large" error and also defining how "error" is measured. With information theory, there is the added problem that "information" (as far as I can tell) is not a universal property of an real life object or situation (such as mass or temperature).

For example, suppose we randomly select a number according the following probabilities
p(1) = 0.3
p(2) = 0.2
p(3) = 0.5

Suppose we win the following amount of money as a function of the number selected
1 wins $5
2 wins $5
3 wins $100

The entropy of the probability distribution for the numbers is not the same as the entropy of the probability distribution for the winnings. If the rules for the amount won were changed to
1 wins $5
2. wins $5
3 wins $6

Then there seems to have been a fundamental change in the situation, but this is not reflected by any change in the entropy of probability density function for the amount won.

So I can't make any useful interpretation of a mutual information graph or what it implies for "good" or "bad" approximations. The first step would be to define how you intend to measure "error". For example is the "true" values is x and the "predicted value" is w, will the error be measured by (w-x)^2 or | w - x | or perhaps some more complicated "payoff" or "penalty" function of w and x?
 
  • #11
Dear Stephen,

thanks for your reply. Your question with regards to the measure of error I think is answered by the mutual information itself, since the mutual information can be expressed as the Kullback-Leibler divergence, indicating the "distance" between two distributions. My intention is to give a measure of the distance between the two distributions so that one can conclude that under certain parameter settings the assumption of independence may or may not be claimed. Such is the case in the previously attached figure for p=5, where maximum mutual information is shared, albeit the obtained value is small.

As a last thought, to provide insight into the relativeness of the 0.035 value that corresponds to p=5, I thought of comparing this with the entropy of one of the random variables comprising the joint distribution.

If you think that this makes sense, then please let me know.

BR,

Alex
 
Last edited:
  • #12
architect said:
My intention is to give a measure of the distance between the two distributions so that one can conclude that under certain parameter settings the assumption of independence may or may not be claimed.
I don't recall your explaining the distribution for which p is a parameter and whether the plot is a result of computing or simulation. At any rate, it isn't clear what mathematical problem is posed by you goal to "conclude that under certain parameter settings the assumption of independence may or may not be claimed". If you mean that when the mutual information between two variables is zero then you can say they are independent, that's true. If you have some idea that a mutual information that is "small" implies the two variables are "nearly" independent, you have to give some critera for what "small" and "nearly" mean. If the plots you make are from statistical sampling and you want to do some sort of statistical "hypothesis test" to "accept" or "reject" the idea that the two variables are independent then this is yet a different type of mathematical problem.

Such is the case in the previously attached figure for p=5, where maximum mutual information is shared, albeit the obtained value is small.
I understand the plot shows mutual information vs a parameter, but I are you saying that you wish to conclude something about the two random variables based on the maximum mutual information, which is the value produced at p = 5?

As a last thought, to provide insight into the relativeness of the 0.035 value that corresponds to p=5, I thought of comparing this with the entropy of one of the random variables comprising the joint distribution.

"Comparing" is a mathematically ambiguous term. It could mean taking a ratio or a difference or simply looking at which of two things is greater. I also don't know what judgments are to be made on the basis of this comparison.
 
  • #13
Yes, p is a parameter of the distribution and the plot is the result of computing and not simulation.

If you have some idea that a mutual information that is "small" implies the two variables are "nearly" independent, you have to give some criteria for what "small" and "nearly" mean.

This is exactly what I am trying to achieve. This is the reason that I proposed the comparison with the entropy of one of the random variables, in order to give some criteria for what "nearly" independent is in this case; to give a measure of the extent of "correlation" between the two.

I understand the plot shows mutual information vs a parameter, but are you saying that you wish to conclude something about the two random variables based on the maximum mutual information, which is the value produced at p = 5?

Yes, this is exactly what I mean.


BR,

Alex
 
  • #14
As far as I know, there are no mathematical results that establish any "standard" scale for mutual information that defines what "small" differences are or what "nearly" mutually independent means. It's similar to the case of evaluating a procedure to approximate a function or an area. The mathematical results concern limits, they don't establish that .001 or .0000001 is a "small" number or even that .0000001 is a "small" percentage. The question of what constitutes a "small" amount depends on the practical application.

If you are willing to stipulate what a "small" probability is, then it might be possible to find a result that falls in the pattern:

Let X and Y be random variables with a joint distribution that is in the family of distributions given by f(X,Y,rho) where rho is a parameter. Let g(X,rho) and h(Y,rho) be the respective marginal densities.

If the mutual information between random variables X and Y is less than Delta when rho = rho_zero then the probability of the event { (x,y): such that | f(x,y,rho_zero) - g(X,rho_zero) h(Y,rho_zero) | > .04 } < Delta^2.

Off hand, I don't know of any such results, but that type of result would connect information to probability.

To repeat what I said before, I see no way to establish an absolute scale for information that works in all practical applications. For example Let X be the area of one face of a cube. If X is uniformly distributed on the interval [1,4] then the distribution of the side of the cube is not uniformly distributed and the distribution of the area of the cube is not uniformly distributed. The three distributions have different entropies. Unless a person specifies which quantity ( length, area, volume) is of concern to him, there is no way to establish which entropy has practical significance.

When you deal with discrete distributions, you avoid the above problem. For example, if the area of one face of a cube can only have the discrete values 1 or 4 and each has probability 1/2, then the probability distributions for the possible sides and volumes, each have two possible values with probability 1/2.
 

Related to Multiplication of marginal densities

What is multiplication of marginal densities?

Multiplication of marginal densities is a mathematical operation that combines two or more probability distributions to obtain a new probability distribution. It is used to model the joint probability of multiple variables in a system.

Why is multiplication of marginal densities important in scientific research?

Multiplication of marginal densities is important in scientific research because it allows for the analysis of complex systems with multiple variables. By combining marginal densities, scientists can gain insights into the joint probability of these variables and make predictions about the behavior of the system.

What is the difference between multiplication of marginal densities and convolution?

The main difference between multiplication of marginal densities and convolution is that multiplication is used for continuous variables, while convolution is used for discrete variables. Additionally, multiplication combines probability distributions, while convolution combines functions.

How is multiplication of marginal densities used in data analysis?

In data analysis, multiplication of marginal densities is often used to model the joint probability of multiple variables in a dataset. This allows for the identification of relationships and patterns between these variables, which can inform further analysis and decision-making.

What are some applications of multiplication of marginal densities in scientific fields?

Multiplication of marginal densities has various applications in scientific fields such as statistics, physics, biology, and economics. It is used for modeling complex systems, predicting outcomes, and understanding the relationships between variables in these fields.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top