- Thread starter
- #1

#### lfdahl

##### Well-known member

- Nov 26, 2013

- 719

The winner(s) are the penny(s) which were flipped the most times. Prove that

the probability there is only one winner is at least $\frac{2}{3}$.

- Thread starter lfdahl
- Start date

- Thread starter
- #1

- Nov 26, 2013

- 719

The winner(s) are the penny(s) which were flipped the most times. Prove that

the probability there is only one winner is at least $\frac{2}{3}$.

I took "some pennies" to mean $n$ pennies where $n$ is any arbitrary natural number you like $\geq 2$ .

We probably could solve this directly with order statistics but it seemed like a lot of calculations with not that much insight.

My approach below is conceptually simple in that for $n\geq 4$ all the probabilities of having a unique maximal penny (i.e. non-bulk arrival in the final epoch of the merged bernouli process) tends to be right around 0.71 to 0.72. Nice random variables like sums of bernoulis concentrate about their mean so we can ignore the probabilities associated with tails for large $n$. This then implies that all probabilities have to be $\gt 0.7$ for large $n$. That really is the whole argument. Unfortunately flushing this out ended up being a bit long.

lemma:

$\sum_{k=21}^\infty 2 e^4 \cdot e^{-k/2} = \sum_{k=0}^\infty 2 e^4 \cdot e^{-(k+21)/2} = \frac{2}{e^{6.5}-e^{6}} \approx 0.0076$

(geometric sum)

This implies

$a_k = \prod_{k = 0}^r \big( 1 - 2 e^2 \cdot e^{-(k+21)/2} \big) \geq 1 - \sum_{k=0}^\infty 2 e^4 \cdot e^{-(k+21)/2} \geq 1- \frac{2}{e^{6.5}-e^{6}} \approx 0.992358 $

by the union bound

so the sequence is monotone decreasing: $1 \gt a_0 \gt a_1 \gt a_2 \gt ...$

and bounded below by 0.992, so the limit exists and is $\in (0.992,1)$

- - - -

For example suppose we have n = 20 pennies. This is equivalent to a merger of 20 different Bernouli processes, with probability of success = $\frac{1}{2}$, where a given process "drops out' upon first success / arrival. (This is analogous exactly with order statistics in exponential r.v.'s and poisson merging and splitting, except Poisson's are cleaner with zero probability of bulk arrivals, which is the point of this problem I suppose.)

If there are $s$ bulk arrivals success/ in the first round, then we have $n-s$ pennies in the second round, equivalently a merger of m-s different Bernoulis. By memorylessness this holds for any arbitrary round. E.g. if there are 20 pennies at the beginning of a round and 4 successes, then there are 16 pennies still being tossed next round. The number of successes in a given round with $m$ pennies at the beginning is binomially distributed with $m$ and $\frac{1}{2}$ as parameters. This in combination with the number of successes is enough to form an absorbing state markov chain, where 0 and 1 are absorbing states; literally interpreted the problem in effect asks what is the probability of entering state 0 through state 1, but all anyone in state one enters state zero with probability 1 -- so we though we can make state 1 absorbing to capture the total probability of paths entering state one before finishing. (Equivalenlty items absorbed in state 0 have bulk arrivals in their final success period -- i.e. multiple pennies with the same maximal number of tosses amongst the n pennies.)

The markov chain has a row stochastic transition matrix that looks like

$P = \begin{bmatrix}

1 & 0& \mathbf 0^T \\

0 & 1& \mathbf 0^T \\

\mathbf r_0 & \mathbf r_1 & \mathbf T \\

\end{bmatrix}$

where $\mathbf T$ represents transient states (which implies a spectral radius striclty less than one), and coincidentally is lower triangular

for our example, the problem would have states $\{0,1,2,...,20\}$ -- so it would be a 21 x 21 matrix and $\mathbf T$ would be 19 x 19

using standard absorbing state markov chain techniques (justified on grounds of 'first step analysis') we have

$\big(\mathbf I - \mathbf T\big) \mathbf v = \mathbf r_1$

so

$\mathbf v =\big(\mathbf I - \mathbf T\big)^{-1}\mathbf r_1$

if we consider

$\mathbf x =\begin{bmatrix}

0\\

1 \\

\mathbf v\\

\end{bmatrix}$

then $\mathbf x$ is a right eigenvector, i.e. $P\mathbf x = \mathbf x$

which can be justified on linear algebra grounds, or more easily on martingale grounds.

Here is what the transition matrix looks like when $m= 6$

$P =\left[\begin{matrix}1 & 0 & 0 & 0 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0 & 0\\0.25 & 0.5 & 0.25 & 0 & 0 & 0 & 0\\0.125 & 0.375 & 0.375 & 0.125 & 0 & 0 & 0\\0.0625 & 0.25 & 0.375 & 0.25 & 0.0625 & 0 & 0\\0.03125 & 0.15625 & 0.3125 & 0.3125 & 0.15625 & 0.03125 & 0\\0.015625 & 0.09375 & 0.234375 & 0.3125 & 0.234375 & 0.09375 & 0.015625\end{matrix}\right]

$

(I'm more interested in $m=20$ but its too big and messy to show)

solving for $m=20$ gives

$\mathbf x^{(20)}= \left[\begin{matrix}0.0\\1.0\\0.666666666666667\\0.714285714285714\\0.723809523809524\\0.723502304147466\\0.721966205837174\\0.721107442214884\\0.720915198693142\\0.721046098475298\\0.721245657455507\\0.721394775455441\\0.721464805972235\\0.721470062681967\\0.721437342436462\\0.721390932277993\\0.721347423650618\\0.721315546784049\\0.721297995973785\\0.721293636660732\\0.721299387756606\end{matrix}\right]$

noticing that the bound we want to prove is sharp for the $m=2$ case, but for $m\geq 3$ all values are at least 0.71

so for $m=21$ we'd like to find the unique value that we could append to the vector $\mathbf x^{(20)}$ to create $\mathbf x^{(21)}$ such that it is still a fixed point of the appropriate transition matrix (and this must be our probability of ultimately being absorbed into state 1)

$L_{21}$

$ = \lim_{r \to \infty} p_{21,1}^{(r)} $

$= p_{21,0}\cdot 0 + p_{21,1}\cdot 1 + p_{21,2}\cdot \frac{2}{3} +\Big( p_{21,3}\cdot 0.714285714285714 + ... + 0.721299387756606 \cdot p_{21,20} + L_{21}\cdot p_{21,21}\Big) $

$\geq \Big( p_{21,3}\cdot 0.714285714285714 + ... + 0.721299387756606 \cdot p_{21,20} + L_{21}\cdot p_{21,21}\Big) $

$= \sum_{j=3}^{21} p_{21,j}\cdot x_j^{(21)} $

$\geq \sum_{j=3}^{18} p_{21,j}\cdot x_j^{(21)} $ (removing the "right tail" technically isn't needed but the reasoning is subtle so I went with a cruder, easier approach)

$= \sum_{j=3}^{18} p_{21,j}\cdot x_j^{(20)} $

$\geq \sum_{j=3}^{18} p_{21,j}\cdot 0.71 $

$=0.71 \sum_{j=3}^{18} p_{21,j} $

$\geq 0.71 \cdot a_0$

where the final inequality follows from (the complement of) the 2 sided chernoff bound

The one sided chernoff bound is

$P\Big(S_n \geq c + \frac{n}{2} = n-2\Big) \leq e^{-\frac{2c^2}{n}} $

selecting

$c= \frac{n}{2} - 2$

$P\Big(S_n \geq n-2\Big) \leq e^{-\frac{2(\frac{n^2}{4} - 2n + 4)}{n}}= e^{-\frac{\frac{n^2}{2} - 4n + 8}{n}}=e^{-(\frac{n}{2} - 4 + \frac{8}{n})}\leq e^{-(\frac{n}{2} - 4)} = e^{-\frac{n}{2}} \cdot e^{4}$

and double it for the 2 sided bound (mutually exclusive probabilities add, and so do upper bounds on their probabilities)

Finally, inducting or iterating from here gives the general bound, e.g. for $m=22$ we'd have

$L_{22} \geq 0.71 \cdot a_1\geq 0.71 \cdot 0.992$

and for arbitrary $m\geq 20$ we

$L_{m} \geq 0.71 \cdot a_{m-20} \geq 0.71 \cdot 0.992 \gt \frac{2}{3}$

We probably could solve this directly with order statistics but it seemed like a lot of calculations with not that much insight.

My approach below is conceptually simple in that for $n\geq 4$ all the probabilities of having a unique maximal penny (i.e. non-bulk arrival in the final epoch of the merged bernouli process) tends to be right around 0.71 to 0.72. Nice random variables like sums of bernoulis concentrate about their mean so we can ignore the probabilities associated with tails for large $n$. This then implies that all probabilities have to be $\gt 0.7$ for large $n$. That really is the whole argument. Unfortunately flushing this out ended up being a bit long.

lemma:

$\sum_{k=21}^\infty 2 e^4 \cdot e^{-k/2} = \sum_{k=0}^\infty 2 e^4 \cdot e^{-(k+21)/2} = \frac{2}{e^{6.5}-e^{6}} \approx 0.0076$

(geometric sum)

This implies

$a_k = \prod_{k = 0}^r \big( 1 - 2 e^2 \cdot e^{-(k+21)/2} \big) \geq 1 - \sum_{k=0}^\infty 2 e^4 \cdot e^{-(k+21)/2} \geq 1- \frac{2}{e^{6.5}-e^{6}} \approx 0.992358 $

by the union bound

so the sequence is monotone decreasing: $1 \gt a_0 \gt a_1 \gt a_2 \gt ...$

and bounded below by 0.992, so the limit exists and is $\in (0.992,1)$

- - - -

For example suppose we have n = 20 pennies. This is equivalent to a merger of 20 different Bernouli processes, with probability of success = $\frac{1}{2}$, where a given process "drops out' upon first success / arrival. (This is analogous exactly with order statistics in exponential r.v.'s and poisson merging and splitting, except Poisson's are cleaner with zero probability of bulk arrivals, which is the point of this problem I suppose.)

If there are $s$ bulk arrivals success/ in the first round, then we have $n-s$ pennies in the second round, equivalently a merger of m-s different Bernoulis. By memorylessness this holds for any arbitrary round. E.g. if there are 20 pennies at the beginning of a round and 4 successes, then there are 16 pennies still being tossed next round. The number of successes in a given round with $m$ pennies at the beginning is binomially distributed with $m$ and $\frac{1}{2}$ as parameters. This in combination with the number of successes is enough to form an absorbing state markov chain, where 0 and 1 are absorbing states; literally interpreted the problem in effect asks what is the probability of entering state 0 through state 1, but all anyone in state one enters state zero with probability 1 -- so we though we can make state 1 absorbing to capture the total probability of paths entering state one before finishing. (Equivalenlty items absorbed in state 0 have bulk arrivals in their final success period -- i.e. multiple pennies with the same maximal number of tosses amongst the n pennies.)

The markov chain has a row stochastic transition matrix that looks like

$P = \begin{bmatrix}

1 & 0& \mathbf 0^T \\

0 & 1& \mathbf 0^T \\

\mathbf r_0 & \mathbf r_1 & \mathbf T \\

\end{bmatrix}$

where $\mathbf T$ represents transient states (which implies a spectral radius striclty less than one), and coincidentally is lower triangular

for our example, the problem would have states $\{0,1,2,...,20\}$ -- so it would be a 21 x 21 matrix and $\mathbf T$ would be 19 x 19

using standard absorbing state markov chain techniques (justified on grounds of 'first step analysis') we have

$\big(\mathbf I - \mathbf T\big) \mathbf v = \mathbf r_1$

so

$\mathbf v =\big(\mathbf I - \mathbf T\big)^{-1}\mathbf r_1$

if we consider

$\mathbf x =\begin{bmatrix}

0\\

1 \\

\mathbf v\\

\end{bmatrix}$

then $\mathbf x$ is a right eigenvector, i.e. $P\mathbf x = \mathbf x$

which can be justified on linear algebra grounds, or more easily on martingale grounds.

Here is what the transition matrix looks like when $m= 6$

$P =\left[\begin{matrix}1 & 0 & 0 & 0 & 0 & 0 & 0\\0 & 1 & 0 & 0 & 0 & 0 & 0\\0.25 & 0.5 & 0.25 & 0 & 0 & 0 & 0\\0.125 & 0.375 & 0.375 & 0.125 & 0 & 0 & 0\\0.0625 & 0.25 & 0.375 & 0.25 & 0.0625 & 0 & 0\\0.03125 & 0.15625 & 0.3125 & 0.3125 & 0.15625 & 0.03125 & 0\\0.015625 & 0.09375 & 0.234375 & 0.3125 & 0.234375 & 0.09375 & 0.015625\end{matrix}\right]

$

(I'm more interested in $m=20$ but its too big and messy to show)

Code:

```
#python 3.x
import numpy as np
from scipy.special import comb
def binom_formula(p, n, k):
return comb(n,k)*p**k * (1-p)**(n-k)
m= 20
p = 1/2
q = 1 - p
A = np.zeros((m+1,m+1))
A[0,0] = 1
A[1,1] = 1
A[2,2] = q**2
A[2,1] = comb(2, 1)*q*p
A[2,0] = comb(2, 0)*q**0*p**2
for i in range(3, m):
for j in range(0,i+1):
A[i,j] = binom_formula(p, i, j)
```

solving for $m=20$ gives

$\mathbf x^{(20)}= \left[\begin{matrix}0.0\\1.0\\0.666666666666667\\0.714285714285714\\0.723809523809524\\0.723502304147466\\0.721966205837174\\0.721107442214884\\0.720915198693142\\0.721046098475298\\0.721245657455507\\0.721394775455441\\0.721464805972235\\0.721470062681967\\0.721437342436462\\0.721390932277993\\0.721347423650618\\0.721315546784049\\0.721297995973785\\0.721293636660732\\0.721299387756606\end{matrix}\right]$

noticing that the bound we want to prove is sharp for the $m=2$ case, but for $m\geq 3$ all values are at least 0.71

so for $m=21$ we'd like to find the unique value that we could append to the vector $\mathbf x^{(20)}$ to create $\mathbf x^{(21)}$ such that it is still a fixed point of the appropriate transition matrix (and this must be our probability of ultimately being absorbed into state 1)

$L_{21}$

$ = \lim_{r \to \infty} p_{21,1}^{(r)} $

$= p_{21,0}\cdot 0 + p_{21,1}\cdot 1 + p_{21,2}\cdot \frac{2}{3} +\Big( p_{21,3}\cdot 0.714285714285714 + ... + 0.721299387756606 \cdot p_{21,20} + L_{21}\cdot p_{21,21}\Big) $

$\geq \Big( p_{21,3}\cdot 0.714285714285714 + ... + 0.721299387756606 \cdot p_{21,20} + L_{21}\cdot p_{21,21}\Big) $

$= \sum_{j=3}^{21} p_{21,j}\cdot x_j^{(21)} $

$\geq \sum_{j=3}^{18} p_{21,j}\cdot x_j^{(21)} $ (removing the "right tail" technically isn't needed but the reasoning is subtle so I went with a cruder, easier approach)

$= \sum_{j=3}^{18} p_{21,j}\cdot x_j^{(20)} $

$\geq \sum_{j=3}^{18} p_{21,j}\cdot 0.71 $

$=0.71 \sum_{j=3}^{18} p_{21,j} $

$\geq 0.71 \cdot a_0$

where the final inequality follows from (the complement of) the 2 sided chernoff bound

The one sided chernoff bound is

$P\Big(S_n \geq c + \frac{n}{2} = n-2\Big) \leq e^{-\frac{2c^2}{n}} $

selecting

$c= \frac{n}{2} - 2$

$P\Big(S_n \geq n-2\Big) \leq e^{-\frac{2(\frac{n^2}{4} - 2n + 4)}{n}}= e^{-\frac{\frac{n^2}{2} - 4n + 8}{n}}=e^{-(\frac{n}{2} - 4 + \frac{8}{n})}\leq e^{-(\frac{n}{2} - 4)} = e^{-\frac{n}{2}} \cdot e^{4}$

and double it for the 2 sided bound (mutually exclusive probabilities add, and so do upper bounds on their probabilities)

Finally, inducting or iterating from here gives the general bound, e.g. for $m=22$ we'd have

$L_{22} \geq 0.71 \cdot a_1\geq 0.71 \cdot 0.992$

and for arbitrary $m\geq 20$ we

$L_{m} \geq 0.71 \cdot a_{m-20} \geq 0.71 \cdot 0.992 \gt \frac{2}{3}$

Last edited:

- Thread starter
- #3

- Nov 26, 2013

- 719

Unfortunately, I am not capable of evaluating your answer which by far exceeds the level intended for the problem (please cf. below the suggested solution).

But I really appreciate your participation and effort!! Thankyou so much!

Maybe someone else reading these lines will help evaluating your thorough answer. In fact, this could be a new challenge in the forum!

Hope to see you in new challenges to come!

Suggested solution:

Let´s say we start with two pennies ($n=2$).

Define the events:

$A$: There´s only one winner (“survivor”)

$B_0$: There are no “survivors” (both pennies show heads).

$B_1$: There´s exactly one “survivor”

$B_2$: There are exactly two “survivors”

Clearly, $P_2(A|B_0) = 0$ and $P_2(A|B_1) = 1$.

Also: $P_2(A|B_2) = P_2(A)$.

Using the law of total probability, we get:

\[P_2(A) = \sum_{i=0}^{2}P_2(A|B_i)P_2(B_i) = \left ( \frac{1}{2} \right )^2\left ( \binom{2}{0}\cdot 0+\binom{2}{1}\cdot 1+\binom{2}{2}P_2(A) \right )\]

Thus, $P_2(A) = \frac{2}{3}$. We can use this result to calculate $P_3(A)$ ($n=3$). One gets $P_3(A) =

\frac{5}{7} (> \frac{2}{3})$. This suggests, that we prove the statement by induction: Suppose it holds for some $n-1 > 2$.

We can extend the law of total probability to case $n$, keeping in mind, that $P_n(A|B_i) = P_i(A)$ for $i \leq n$.

\[P_n(A) = \sum_{i=0}^{n}P_n(A|B_i)P_n(B_i) = \left ( \frac{1}{2} \right )^n\left ( \binom{n}{0}\cdot 0+\binom{n}{1}\cdot 1+\binom{n}{2}P_n(A|B_2)...+\binom{n}{n}P_n(A) \right )\\\\ \left ( 2^n-1 \right )P_n(A) = n + \sum_{i=2}^{n-1}\binom{n}{i}P_i(A) \geq n+\frac{2}{3}\sum_{i=2}^{n-1}\binom{n}{i} = n + \frac{2}{3}\left ( 2^n -1-n-1\right )\\\\ (2^n-1)P_n(A) = \frac{2}{3}\left ( 2^n-1 \right )+\frac{1}{3}\left ( n-2 \right )\]

Thus, $P_n(A) = \frac{2}{3}+\frac{n-2}{3(2^n-1)}$, which is clearly greater than $\frac{2}{3}$ for $n \geq 3$, and we are done.

Yea I did kind of throw the kitchen sink at this one, and there are probably a few typos in there as I kept changing things (e.g. the sequence should have said $a_r$ not $a_k$).Hi, steep , thankyou indeed for your in depth analytic approach to the problem.

Unfortunately, I am not capable of evaluating your answer which by far exceeds the level intended for the problem (please cf. below the suggested solution).

But I really appreciate your participation and effort!! Thankyou so much!

It occurs to me that my approach to the problem could/should be massively simplified using a 'guess' of $\frac{2}{3}$ for all values of $m\geq 4$ and then basically applying method of relaxations, which is a common technique in Dirichlet problems... Basically I then show that the guess implies an increasing sequence and it's done. So

$\mathbf x^{(20)}_{\text{'guess' 1 }}= \left[\begin{matrix}0\\1\\ \frac{2}{3}\\ \frac{2}{3} \\ \vdots \\ \frac{2}{3}\\ \frac{2}{3}\end{matrix}\right]$

so

$\mathbf x^{(20)}_{\text{'guess' 2 }} = P\mathbf x^{(20)}_{\text{'guess' 1 }}$

but this implies the below non-decreasing sequence, where the below are understood as

$\mathbf x^{(20)}_{\text{'guess' 1 }} \leq P\mathbf x^{(20)}_{\text{'guess' 1 }} \leq P^2 \mathbf x^{(20)}_{\text{'guess' 1 }}\leq P^3 \mathbf x^{(20)}_{\text{'guess' 1 }} \leq ...$

This should make for a

ok, here's the simplified argument. As a rehash:

$P = \begin{bmatrix}

1 & 0& \mathbf 0^T \\

0 & 1& \mathbf 0^T \\

\mathbf r_0 & \mathbf r_1 & \mathbf T \\

\end{bmatrix}$

the essence of the fair game / martingale argument is that in finite dimensions, a fair game always stays fair. (For infinite dimensions, there are delicate convergence issues to address.) So if we get a reward of 1 for visiting (absorbing) state 1 and reward of 0 for visiting (absorbing) state 0, but with probability 1 we always visit one of those states, then we can figure out our probability of visiting state 1 vs state 0 by finding a well chosen $\mathbf f$ such that $P\mathbf f = \mathbf f$ and the zeroth component of $\mathbf f$ is zero and the 1st component of $\mathbf f$ is one, because

$\lim_{r \to \infty} P^r\mathbf f = P\mathbf f = \mathbf f$

you can easily check that $P-I$ has 2 linearly independent vectors in its nullspace, and that $P\mathbf 1 = \mathbf 1$, so we can interpret $\mathbf f$ as the 'other' right eigenvector with eigenvalue of 1.. (If you prefer you can choose the other non-zero vector in the nullspace of $P-I$ to be orthogonal to $\mathbf 1$ and write $\mathbf f$ as a linear combination of these).

= = = = =

note: in substance this argument is problem 27, page 428 of Grinstead and Snells probability book

https://www.math.dartmouth.edu/~prob/prob/prob.pdf

which is aimed at undergrads, though for whatever reason i don't think many people know this approach.

= = = = =

the question becomes, how do we find this $\mathbf f$ besides a long messy Gaussian elimination?

The point is that for $m \geq 2 $ (i.e. the transient states) we have a binomial probability distribution, and with probability of heads = $\frac{1}{2}$ then we can look at probability of getting m-1 heads vs m heads and see

$1 \cdot m 2^{-m} + 0 \cdot 2^{-m} = m 2^{-m} \geq \frac{2}{3}\big(m 2^{-m} + 2^{-m} \big) = \frac{2}{3} \cdot (m+1) 2^{-m}$

or

$m \geq \frac{2}{3} \cdot (m+1) $

or

$\frac{3}{2} m \geq m+1 $ or

$\frac{1}{2} m \geq 1 $ , i.e. $m\geq 2$

$\mathbf x^{(n)}_{\text{'guess' 1 }}= \left[\begin{matrix}0\\1\\ \frac{2}{3}\\ \frac{2}{3} \\ \vdots \\ \frac{2}{3}\\ \frac{2}{3}\end{matrix}\right]$

so looking at row $i\geq 2$ of the above

we see that the ith row is given by (where $B\big(i,j,\frac{1}{2}\big)$ is binomial probability where there are $i$ trials and we choose $j$ successes)

$ \mathbf e_i^T P\mathbf x^{(n)}_{\text{'guess' 1 }} $

$= \Big(0\cdot B\big(i,0, \frac{1}{2}\big) + 1\cdot B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big)$

$ \geq \frac{2}{3}\Big( B\big(i,0, \frac{1}{2}\big) + B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \Big( \sum_{j=0}^i B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \Big(1 \Big) $

$ = \frac{2}{3} $

and of course for rows zero and one we have

$ \mathbf e_0^T P\mathbf x^{(n)}_{\text{'guess' 1 }} = 0$

$ \mathbf e_1^T P\mathbf x^{(n)}_{\text{'guess' 1 }} = 1$

putting this together:

$ P\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf x^{(n)}_{\text{'guess' 1 }} + \mathbf y^{(1)} \geq \mathbf x^{(n)}_{\text{'guess' 1 }}$

__where all inequalities are understood to be component wise inequalities__ so

$\mathbf y^{(1)} \geq 0$

but applying $P$ again gives

$ P^2\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf x^{(n)}_{\text{'guess' 1 }} + \mathbf y^{(1)} + \mathbf y^{(2)} \geq P \mathbf x^{(n)}_{\text{'guess' 1 }}$

where we make use of the fact that $P\mathbf y^{(1)} \geq 0$ because each are comprised solely of real non-negative components

Iterating gives a non-decreasing sequence that is bounded above (because these are probabilities...)

$ \mathbf x^{(n)}_{\text{'guess' 1 }}\leq P\mathbf x^{(n)}_{\text{'guess' 1 }} \leq P^2\mathbf x^{(n)}_{\text{'guess' 1 }}\leq ... \leq P^r\mathbf x^{(n)}_{\text{'guess' 1 }} \leq \begin{bmatrix}

0 \\

1 \\

\mathbf 1\\

\end{bmatrix}$

(the highest power is P to the r, though its kind of hard to see here)

applying monotone convergence component-wise tells us that a limit $z_i$ exists for each component, and in fact

$\mathbf x^{(n)}_{\text{'guess' 1 }}\leq P\mathbf x^{(n)}_{\text{'guess' 1 }} \leq P^2\mathbf x^{(n)}_{\text{'guess' 1 }}\leq \lim_{r \to \infty} P^r\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf z = \lim_{r \to \infty} P\cdot P^r\mathbf x^{(n)}_{\text{'guess' 1 }} = P\cdot \mathbf z$

so $\mathbf z =P\cdot \mathbf z$

and it has a zero in zeroth component, and 1 in first component which tells us it must be $\mathbf f$

hence $\mathbf x^{(n)}_{\text{'guess' 1 }}\leq \mathbf z = \mathbf f$

which completes the problem

After looking at the other provided solution, I think the reason that the simpler inductive approach works without e.g. needing to introduce markov chains, is because $\mathbf T$ is triangular. In any case they are similar in spirit. My writeup is still long, mostly because I tried to include background material. If people know the techniques involved, then the entire argument condenses to

for natural number $i \geq 2$

$0 \cdot 2^{-i}+ 1 \cdot i\cdot 2^{-i} = i\cdot 2^{-i} \geq \frac{2}{3}\big(i\cdot 2^{-i} + 2^{-i} \big) = \frac{2}{3} \cdot (i+1) 2^{-i}$

and

$ \mathbf e_i^T P\mathbf x^{(n)}_{\text{'guess' 1 }} $

$= \Big(0\cdot B\big(i,0, \frac{1}{2}\big) + 1\cdot B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big)$

$ \geq \frac{2}{3}\Big( B\big(i,0, \frac{1}{2}\big) + B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \Big( \sum_{j=0}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \cdot 1$

$ = \frac{2}{3} $

this implies a non-decreasing sequence when we apply $P$ many times to our guess vector $\mathbf x^{(n)}_{\text{'guess' 1 }} $, and this essentially gives the result, with no hard estimates involved.

$P = \begin{bmatrix}

1 & 0& \mathbf 0^T \\

0 & 1& \mathbf 0^T \\

\mathbf r_0 & \mathbf r_1 & \mathbf T \\

\end{bmatrix}$

Code:

```
#python 3.x
import numpy as np
from scipy.special import comb
def binom_formula(p, n, k):
return comb(n,k)*p**k * (1-p)**(n-k)
m= 20
p = 1/2
q = 1 - p
A = np.zeros((m+1,m+1))
A[0,0] = 1
A[1,1] = 1
A[2,2] = q**2
A[2,1] = comb(2, 1)*q*p
A[2,0] = comb(2, 0)*q**0*p**2
for i in range(3, m):
for j in range(0,i+1):
A[i,j] = binom_formula(p, i, j)
```

the essence of the fair game / martingale argument is that in finite dimensions, a fair game always stays fair. (For infinite dimensions, there are delicate convergence issues to address.) So if we get a reward of 1 for visiting (absorbing) state 1 and reward of 0 for visiting (absorbing) state 0, but with probability 1 we always visit one of those states, then we can figure out our probability of visiting state 1 vs state 0 by finding a well chosen $\mathbf f$ such that $P\mathbf f = \mathbf f$ and the zeroth component of $\mathbf f$ is zero and the 1st component of $\mathbf f$ is one, because

$\lim_{r \to \infty} P^r\mathbf f = P\mathbf f = \mathbf f$

you can easily check that $P-I$ has 2 linearly independent vectors in its nullspace, and that $P\mathbf 1 = \mathbf 1$, so we can interpret $\mathbf f$ as the 'other' right eigenvector with eigenvalue of 1.. (If you prefer you can choose the other non-zero vector in the nullspace of $P-I$ to be orthogonal to $\mathbf 1$ and write $\mathbf f$ as a linear combination of these).

= = = = =

note: in substance this argument is problem 27, page 428 of Grinstead and Snells probability book

https://www.math.dartmouth.edu/~prob/prob/prob.pdf

which is aimed at undergrads, though for whatever reason i don't think many people know this approach.

= = = = =

the question becomes, how do we find this $\mathbf f$ besides a long messy Gaussian elimination?

The point is that for $m \geq 2 $ (i.e. the transient states) we have a binomial probability distribution, and with probability of heads = $\frac{1}{2}$ then we can look at probability of getting m-1 heads vs m heads and see

$1 \cdot m 2^{-m} + 0 \cdot 2^{-m} = m 2^{-m} \geq \frac{2}{3}\big(m 2^{-m} + 2^{-m} \big) = \frac{2}{3} \cdot (m+1) 2^{-m}$

or

$m \geq \frac{2}{3} \cdot (m+1) $

or

$\frac{3}{2} m \geq m+1 $ or

$\frac{1}{2} m \geq 1 $ , i.e. $m\geq 2$

$\mathbf x^{(n)}_{\text{'guess' 1 }}= \left[\begin{matrix}0\\1\\ \frac{2}{3}\\ \frac{2}{3} \\ \vdots \\ \frac{2}{3}\\ \frac{2}{3}\end{matrix}\right]$

so looking at row $i\geq 2$ of the above

we see that the ith row is given by (where $B\big(i,j,\frac{1}{2}\big)$ is binomial probability where there are $i$ trials and we choose $j$ successes)

$ \mathbf e_i^T P\mathbf x^{(n)}_{\text{'guess' 1 }} $

$= \Big(0\cdot B\big(i,0, \frac{1}{2}\big) + 1\cdot B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big)$

$ \geq \frac{2}{3}\Big( B\big(i,0, \frac{1}{2}\big) + B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \Big( \sum_{j=0}^i B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \Big(1 \Big) $

$ = \frac{2}{3} $

and of course for rows zero and one we have

$ \mathbf e_0^T P\mathbf x^{(n)}_{\text{'guess' 1 }} = 0$

$ \mathbf e_1^T P\mathbf x^{(n)}_{\text{'guess' 1 }} = 1$

putting this together:

$ P\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf x^{(n)}_{\text{'guess' 1 }} + \mathbf y^{(1)} \geq \mathbf x^{(n)}_{\text{'guess' 1 }}$

$\mathbf y^{(1)} \geq 0$

but applying $P$ again gives

$ P^2\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf x^{(n)}_{\text{'guess' 1 }} + \mathbf y^{(1)} + \mathbf y^{(2)} \geq P \mathbf x^{(n)}_{\text{'guess' 1 }}$

where we make use of the fact that $P\mathbf y^{(1)} \geq 0$ because each are comprised solely of real non-negative components

Iterating gives a non-decreasing sequence that is bounded above (because these are probabilities...)

$ \mathbf x^{(n)}_{\text{'guess' 1 }}\leq P\mathbf x^{(n)}_{\text{'guess' 1 }} \leq P^2\mathbf x^{(n)}_{\text{'guess' 1 }}\leq ... \leq P^r\mathbf x^{(n)}_{\text{'guess' 1 }} \leq \begin{bmatrix}

0 \\

1 \\

\mathbf 1\\

\end{bmatrix}$

(the highest power is P to the r, though its kind of hard to see here)

applying monotone convergence component-wise tells us that a limit $z_i$ exists for each component, and in fact

$\mathbf x^{(n)}_{\text{'guess' 1 }}\leq P\mathbf x^{(n)}_{\text{'guess' 1 }} \leq P^2\mathbf x^{(n)}_{\text{'guess' 1 }}\leq \lim_{r \to \infty} P^r\mathbf x^{(n)}_{\text{'guess' 1 }} = \mathbf z = \lim_{r \to \infty} P\cdot P^r\mathbf x^{(n)}_{\text{'guess' 1 }} = P\cdot \mathbf z$

so $\mathbf z =P\cdot \mathbf z$

and it has a zero in zeroth component, and 1 in first component which tells us it must be $\mathbf f$

hence $\mathbf x^{(n)}_{\text{'guess' 1 }}\leq \mathbf z = \mathbf f$

which completes the problem

After looking at the other provided solution, I think the reason that the simpler inductive approach works without e.g. needing to introduce markov chains, is because $\mathbf T$ is triangular. In any case they are similar in spirit. My writeup is still long, mostly because I tried to include background material. If people know the techniques involved, then the entire argument condenses to

for natural number $i \geq 2$

$0 \cdot 2^{-i}+ 1 \cdot i\cdot 2^{-i} = i\cdot 2^{-i} \geq \frac{2}{3}\big(i\cdot 2^{-i} + 2^{-i} \big) = \frac{2}{3} \cdot (i+1) 2^{-i}$

and

$ \mathbf e_i^T P\mathbf x^{(n)}_{\text{'guess' 1 }} $

$= \Big(0\cdot B\big(i,0, \frac{1}{2}\big) + 1\cdot B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big)$

$ \geq \frac{2}{3}\Big( B\big(i,0, \frac{1}{2}\big) + B\big(i,1, \frac{1}{2}\big)\Big) + \frac{2}{3} \Big( \sum_{j=2}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \Big( \sum_{j=0}^i \cdot B\big(i,j, \frac{1}{2}\big)\Big) $

$ = \frac{2}{3} \cdot 1$

$ = \frac{2}{3} $

this implies a non-decreasing sequence when we apply $P$ many times to our guess vector $\mathbf x^{(n)}_{\text{'guess' 1 }} $, and this essentially gives the result, with no hard estimates involved.

Last edited: