The Missing information function

CAF123 · Jan 18, 2015

Homework Statement

[/B]
Consider the missing information function ##S(\left\{p_i\right\}) = -k \sum_{i=1}^r p_i \ln p_i## where ##\left\{p_i\right\}## represents the probabilities associated with the ##r## mutually exclusive outcomes of some procedure.

i) Sketch the form of S as a function of ##p_1## for ##r=2##
ii) Show that ##S## has its max value when ##p_i = 1/r \forall i##
iii) Show that the value of ##S## is increased when any two of the probablities are changed in such a way as to reduce their difference.

2. Homework Equations

3. The Attempt at a Solution

In i), we take ##S = S(p_1, p_2)## and regard ##p_2## as a constant. Then ##p_1 \in [0,1]## is the domain and the plot of S versus p_1 looks like a skewed hump with max at ##p_1 = 1/e## and ##S=0## when ##p_1 = 0,1##. Did I interpret the question here right? I just don't see why we restrict ##r## to 2.

ii) is fine I think. I get that ##p_j = 1/e## then ##\sum_{i=1}^r p_j = 1 \Rightarrow r (1/e) = 1 = r p_j \Rightarrow p_j = 1/r##. What is the significance of 1/e though? The result wouldn't change as long as it was a constant.

In iii), not really sure where to start. I think the condition is that if ##|p_i + \delta - (p_j + \gamma)| < |p_i - p_j|## then ##dS > 0##, where ##\delta, \gamma## are real numbers. But not sure how to implement this.

mfb · Jan 18, 2015

(i) You cannot regard p₂ as constant. Exactly one of the r outcomes always has to happen, so the sum over all p_i has to be 1.
(ii) I don't understand your argument, but the important part of the calculation is certainly missing.
(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.

CAF123 · Jan 18, 2015

mfb said:

(i) You cannot regard p₂ as constant. Exactly one of the r outcomes always has to happen, so the sum over all p_i has to be 1.

Yes, I understand, but the question asks to plot ##S## as a function of ##p_1##. If ##r=2##, then ##S = S(p_1, p_2)## which I am treating as a function of two variables. If I am to plot this as a function of ##p_1## only then via the constraint ##p_1 + p_2 = 1##, I can obtain ##S(p_1)##.

(ii) I don't understand your argument, but the important part of the calculation is certainly missing.

I took the derivative as shown and found ##p_j = e^{-1}##. Then using the constraint ##\sum_{i=1}^r p_i = 1##. ##p_i## is a constant so the sum gives ##rp_i## and since this is equal to one I obtain ##p_i = 1/r##?

(iii) I would use this to prove (ii). Due to the fixed sum, you cannot introduce two different variables for the changes. Yes, the idea is to show that such a change increases the "missing information" with the formula you have.

Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

TSny · Jan 18, 2015

CAF123 said:

Yes, I understand, but the question asks to plot ##S## as a function of ##p_1##. If ##r=2##, then ##S = S(p_1, p_2)## which I am treating as a function of two variables. If I am to plot this as a function of ##p_1## only then via the constraint ##p_1 + p_2 = 1##, I can obtain ##S(p_1)##.

I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p₁ without simultaneously varying p₂ due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p₁. But then, p₂ is now a function of p₁.

I took the derivative as shown and found ##p_j = e^{-1}##. Then using the constraint ##\sum_{i=1}^r p_i = 1##. ##p_i## is a constant so the sum gives ##rp_i## and since this is equal to one I obtain ##p_i = 1/r##?

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.

Ok, so is the equation I wrote down correct just to check? I'm not sure how to implement this into the calculation.
Thanks.

For (iii), without loss of generality you may take p_i to be the larger of {p_i, p_j}. Let ε be the change in p_j. Is ε >0 or is ε<0? Can you express the change in p_i also in terms of the same ε?

CAF123 · Jan 18, 2015

Hi TSny,

TSny said:

I'll try to help a little while mfb is away having fun. mfb was pointing out that you can't vary p₁ without simultaneously varying p₂ due to the constraint that the p's add to 1. The constraint reduces the number of independent p's from r to r-1. So, for the case of r = 2, you can think of S as a function of one variable, say p₁. But then, p₂ is now a function of p₁.

Ok so this produces a graph that is a symmetric hump with max at ##p_1 = 1/2## and hitting the ##p_1## axis at ##p_1 = 0,1##. I reexpress all instances of ##p_2## by ##1-p_1##.

When finding the maximum of S, did you include the effect of the constraint? The lagrange multipler method might be helpful.

Oh yes, this gives ##p_j = \exp(-\lambda/k - 1) = \text{const}##, where ##\lambda## is the multiplier and then I input into constraint ##\sum_{i=1}^r p_i = 1##. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$

For (iii), without loss of generality you may take p_i to be the larger of {p_i, p_j}. Let ε be the change in p_j. Is ε >0 or is ε<0? Can you express the change in p_i also in terms of the same ε?

I think ##\epsilon## can be greater or smaller than zero. If ##\epsilon < 0## then ##p_j' = p_j - \epsilon##. Then ##p_i' \in (p_i - \epsilon, p_j - \epsilon)## so as to reduce the difference between the two. If ##\epsilon > 0## then ##p_i' > p_i + \epsilon < p_j - \epsilon##

TSny · Jan 18, 2015

CAF123 said:

Hi TSny,

Ok so this produces a graph that is a symmetric hump with max at ##p_1 = 1/2## and hitting the ##p_1## axis at ##p_1 = 0,1##. I reexpress all instances of ##p_2## by ##1-p_1##.

OK

Oh yes, this gives ##p_j = \exp(-\lambda/k - 1) = \text{const}##, where ##\lambda## is the multiplier and then I input into constraint ##\sum_{i=1}^r p_i = 1##. This is $$\sum_{i=1}^r \exp(-1-\lambda/k) = \text{const} \cdot r = 1 \Rightarrow \text{\const} = 1/r = p_i$$

OK

I think ##\epsilon## can be greater or smaller than zero.

If p_j is the smaller of {p_i, p_j}, and if ε is the amount that you change p_j, then ε can have only one particular sign if you are going to reduce the separation between p_j and p_i.

##p_i' \in (p_i - \epsilon, p_j - \epsilon)##

If you reduce p_j and p_i by the same amount, will the constraint condition still hold?

CAF123 · Jan 18, 2015

If p_j is the smaller of {p_i, p_j}, and if ε is the amount that you change p_j, then ε can have only one particular sign if you are going to reduce the separation between p_j and p_i.

I was thinking along the lines of if ##p_j## decreases by ##\epsilon## then ##p_i## must decrease by an amount greater than epsilon so that their difference is reduced. If p_j increased by epsilon then p_i can increase or decrease, with an increase not bigger than epsilon.

If you reduce p_j and p_i by the same amount, will the constraint condition still hold?

No, that would mean their difference is not reduced. That is why I let the interval open, however I was taking p_j to be larger than p_i in that analysis (accidentally) contrary to the notation you gave.

TSny · Jan 18, 2015

Since the sum of the p's must always add to 1, what must happen to p_i if you increase p_j?

CAF123 · Jan 19, 2015

TSny said:

Since the sum of the p's must always add to 1, what must happen to p_i if you increase p_j?

Oh I see what you were getting at, yes sorry one must increase while the other decreases so as to not violate conservation of probability. So, if ##p_i > p_j## and we are to reduce the difference then ##p_j## must increase so that ##p_j' = p_j + \epsilon, \epsilon > 0## while ##p_i' = p_i - \epsilon##. Then ##\sum_k p_k = 1## still.

So, ##S' - S =-k[ (p_j + \epsilon) \ln (p_j + \epsilon) + (p_i - \epsilon)\ln (p_i - \epsilon)]##. Since ##p_i > p_j, p_j + \epsilon > 0## and ##p_i - \epsilon \geq 0##. (i.e if we have initially, ##p_i = \epsilon, p_j = 0##, assuming we can attribute a zero probability to one of the ##p_k##'s) So then ##S'-S>0##. Is it fine?

Maybe one thing I am not getting is why would the difference between the two probabilities change? If ##p_j' = p_j + \epsilon, p_i' = p_i - \epsilon## then one increases by some amount while the other decreases so the difference is the same.

Thanks!

mfb · Jan 19, 2015

CAF123 said:

Maybe one thing I am not getting is why would the difference between the two probabilities change? If ##p_j' = p_j + \epsilon, p_i' = p_i - \epsilon## then one increases by some amount while the other decreases so the difference is the same.

Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.

CAF123 said:

assuming we can attribute a zero probability to one of the ##p_k##'s)

No you cannot.

CAF123 · Jan 19, 2015

mfb said:

Subtract them, and see how the difference differs from the previous difference?
Alternatively, consider this numerical example: Increase 2 by 1 and decrease 7 by 1 and see if the difference changes.

Yes, sorry I was being silly, at things for too long last night, I was too tired.

No you cannot.

Yes I thought so, otherwise there would not be ##r## possible outcomes. But I believe my argument before still works except now with ##p_i - \epsilon > 0##. Yes? Thanks!

mfb · Jan 19, 2015

Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

CAF123 · Jan 20, 2015

mfb said:

Where is your proof that S'-S>0 apart from the easy case where some parts are zero?

##S' = -k [p_1 \ln p_1 + \dots + (p_j + \epsilon)\ln (p_j + \epsilon) + \dots (p_i - \epsilon) \ln (p_i - \epsilon) + \dots ]## and ##S = -k[p_1 \ln p_1 + \dots ]##, so $$S' - S = k[p_j \ln ( \frac{p_j}{p_j + \epsilon}) + p_i \ln (\frac{p_i}{p_i - \epsilon}) - \epsilon \ln ( \frac{p_j + \epsilon}{p_i - \epsilon})]$$ The argument of the first log is < 1 so the whole term is negative. The argument of the second log is > 1 so the whole term is positive. The argument of the last log can be > 1 or < 1. So i am not sure how to make any conclusions?

mfb · Jan 20, 2015

You still have p_j < p_i

CAF123 · Jan 21, 2015

mfb said:

You still have p_j < p_i

Ok, so the argument of the last log can be rewritten like $$\frac{p_j}{p_i - \epsilon} + \frac{\epsilon}{p_i - \epsilon},$$ so if we are changing the ##p_i## and ##p_j## so as to only reduce their difference, then the first term is ##< 1## and the last term is <1. However I still have a term (the first one) that has argument <1 so it is difficult to make conclusions without knowing their relative difference. Thanks.

mfb · Jan 21, 2015

Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define ##p_j = p_i + x##.

A completely different approach: Consider ##S(p_i)## as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

CAF123 · Jan 21, 2015

mfb said:

Then you'll have to find some other way to rewrite the expressions.
It might help (did not test it) to define ##p_j = p_i + x##.

A completely different approach: Consider ##S(p_i)## as function of one independent variable (different from what you did for the other parts), show that its second derivative exists and is negative everywhere.

Ok thanks I'll think more about your suggestion. In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

TSny · Jan 21, 2015

CAF123 said:

In the meantime, I wonder if TSny could perhaps tell me what he had in mind when he was directing me in the previous method? :)

I was thinking of letting ##\epsilon## be a small, first-order quantity and considering the change ##\delta S## when ##\delta p_j = \epsilon##. Note ##\delta p_i## is then ##-\epsilon##. (As before, I'm thinking of ##p_j## as the smaller of the two p's.)

Once you get the result for an infinitesimal ##\epsilon##, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating ##\frac{\partial S}{\partial p_j}## while treating ##p_i## as a function of ##p_j##.

CAF123 · Jan 26, 2015

TSny said:

I was thinking of letting ##\epsilon## be a small, first-order quantity and considering the change ##\delta S## when ##\delta p_j = \epsilon##. Note ##\delta p_i## is then ##-\epsilon##. (As before, I'm thinking of ##p_j## as the smaller of the two p's.)

Once you get the result for an infinitesimal ##\epsilon##, you can easily generalize to a finite change.

This approach is similar to mfb's suggestion of evaluating ##\frac{\partial S}{\partial p_j}## while treating ##p_i## as a function of ##p_j##.

Ok so if I understand you correctly, I am using my expression for S'-S in #13 and rewriting all instances of ##p_i## with ##1-p_j - \sum_{k=1, k \neq i,j} p_k## (to reexpress ##p_i## in terms of ##p_j).## Then take the derivative wrt ##p_j##?

TSny · Jan 26, 2015

You want to show that ##\delta S > 0## when ##p_j## and ##p_i## are moved closer to each other. The ##\delta## operation is essentially the same as taking a differential since we are considering ##\epsilon## as small.

So, consider ##\delta \left ( -k \sum_{n=1}^r p_n \ln p_n \right )##. Only two of the p's vary: p_i and p_j. What do you get for ##\delta \left ( p_i \ln p_i \right )## and ##\delta \left ( p_j \ln p_j \right ) ## when ##\delta p_j = \epsilon## and ##\delta p_i = - \epsilon##?

The Missing information function

Homework Statement

Related to The Missing information function

What is "The Missing Information Function"?

Why is "The Missing Information Function" important in data analysis?

How is "The Missing Information Function" calculated?

What are some common patterns observed in "The Missing Information Function" plot?

How can "The Missing Information Function" be used to handle missing data?

Similar threads

Hot Threads

Recent Insights