Expected Value and Variance for Wilcoxon Signed-Rank Test

Mogarrr · Oct 11, 2014

Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is [itex] \mu = \frac {n(n+1)}2 [/itex] and the variance is [itex] \sigma^2 = \frac {n(n+1)(2n+1)}{24} [/itex].

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

h6ss · Oct 13, 2014

Mogarrr said:

Using a normal approximation method for the Wilcoxon Signed-Rank Test, I've seen that the expected value is [itex] \mu = \frac {n(n+1)}2 [/itex] and the variance is [itex] \sigma^2 = \frac {n(n+1)(2n+1)}{24} [/itex].

I'm wondering why these are the expected value and variance.

I do recognize the formula for the sum of N natural numbers and the sum of N squared natural numbers.

I have an idea as to why the expected value is half the sum of N natural numbers. Under the null hypothesis, roughly half of the differences should be positive, so it would make sense to half the sum of N natural numbers.

I have no intuition for the variance of the distribution.

An explanation would be appreciated.

Are you sure you're right about the expected value?

The statistic used in a signed-rank test is

[itex]W=\sum_{i=1}^{n}I_iR_i[/itex]

where [itex]I_i[/itex] is an indicator variable defined as [itex]0[/itex] if [itex]x_i-y_i[/itex] is negative, and equal to [itex]1[/itex] otherwise, for couples of [itex](x_i,y_i)[/itex] taken from both continuous distributions respectively describing random variables [itex]X_i[/itex] and [itex]Y_i[/itex].

Now, note that

[itex]W=\sum_{i=1}^{n}I_iR_i[/itex]

has the same distribution of

[itex]U=\sum_{i=1}^{n}U_i[/itex],

where [itex]P(U_i=0)=P(U_i=i)=0.5[/itex], since both [itex]W[/itex] and [itex]U[/itex] are sums of subsets of [itex]1,2,...,n[/itex].

In other words, the equal chances of falling on either a negative or a positive difference are equivalent to the equal chances of being included in the sum or not.

Therefore,

[itex]E(W)=E(U)=\sum_{i=1}^{n}E(U_i)=\sum_{i=1}^{n}[0\frac{1}{2}+i\frac{1}{2}]=\frac{1}{2}\sum_{i=1}^{n}i[/itex]

And we know that

[itex]\sum_{i=1}^{n}i=\frac{n(n+1)}{2}[/itex],

Therefore,

[itex]E(W)=\frac{n(n+1)}{4}[/itex].

Now what would you get for the variance, working with [itex]Var(W)=Var(U)[/itex] knowing the [itex]U_i[/itex] are independent?

A similar work would do the trick.

In fact, the results make sense because the test statistic [itex]W[/itex] ranges from a minimum of [itex]0[/itex], if all the differences are negative, to a maximum of [itex]\frac{n(n+1)}{2}[/itex], if all the differences are positive. Since everything we're working with is symmetric (equally probably two results), then [itex]W[/itex] is expected to be close to its mean, [itex]\frac{n(n+1)}{4}[/itex].

Mogarrr · Oct 15, 2014

Right. I wrote down the wrong number for the expected value.

So similarly, [itex] E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}[/itex].

Then the variance of W is [itex] EW^2 - (EW)^2 [/itex], but this quantity doesn't seem to come out to be the variance I was given.

h6ss · Oct 15, 2014

Mogarrr said:

Right. I wrote down the wrong number for the expected value.

So similarly, [itex] E W^2 = E U^2 = \sum_{i=1}^n 0 \cdot \frac 12 + i^2 \cdot \frac 12 = \frac 12 \sum_{i=1}^n i^2 = \frac {n(n+1)(2n+1)}{12}[/itex].

Then the variance of W is [itex] EW^2 - (EW)^2 [/itex], but this quantity doesn't seem to come out to be the variance I was given.

Be careful, there's a difference between [itex]U[/itex] and [itex]U_i[/itex]!

You're assuming [itex]E(U_i^2)=E(U^2)[/itex] but in fact, we have that [itex]E(U)=\sum_{i=1}^{n}E(U_i)[/itex] because the [itex]U_i[/itex] are independent.

We should have :

[itex]Var(U_i) = E(U_i^2)-E^2(U_i) = \left(0^2 \cdot \frac 12 + i^2 \cdot \frac 12\right) - \left(\frac{1}{2}\right)^2= \frac {i^2}{2} - \left(\frac{i}{2}\right)^2 = \frac{i^2}{4}[/itex]

And finally,

[itex]Var(W) = \sum_{i=1}^{n} Var(U_i) = \sum_{i=1}^{n} \frac {i^2}{4} = \frac{1}{4} \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}[/itex]

gives us the expected result.

ron_vancouver · Jan 6, 2015

A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

h6ss · Feb 4, 2015

ron_vancouver said:

A follow-up question. For expectation and variance of wilcoxon, for the value of n (i.e., number of pairs), do you exclude pairs in which the difference between the pairs is zero? So let's say you have 100 pairs (n = 100), but for one of the pairs, the score for the two observations is the same and thus they are excluded in determining ranks. Now if you wish to determine whether or not the obtained W is significant, you convert to a z score using (W-expW)/sqrt(varW). So again my question, in this scenario to compute expW and varW, does n = 100 or does n = 99?

In most applications of the Wilcoxon test, we omit from consideration the cases where the absolute difference of ##X_i## and ##Y_i## for a certain bivariate pair is zero. They provide no useful information to the procedure.

Expected Value and Variance for Wilcoxon Signed-Rank Test

Related to Expected Value and Variance for Wilcoxon Signed-Rank Test

1. What is the purpose of calculating expected value and variance for the Wilcoxon Signed-Rank Test?

2. How is the expected value calculated for the Wilcoxon Signed-Rank Test?

3. What does the variance represent in the Wilcoxon Signed-Rank Test?

4. How is the variance calculated for the Wilcoxon Signed-Rank Test?

5. Can the expected value and variance be used to make inferences about the population?

Similar threads

Hot Threads

Recent Insights