# Unsolved statistics questions from other sites, part II

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

The r.v. $Z_{2}$ of course is related to the r.v. $\displaystyle X= \sum_{i=1}^{n} X_{i}$, where the $X_{i}$ are all uniformely distributed from 1 to N. Also in this case the problem is relatively easy if some approximation is allowed, so that we adopt the Central Limit Theorem as in...

http://www.mathhelpboards.com/f23/unsolved-statistics-questions-other-sites-932/index9.html#post7147

Each $X_{i}$ has mean...

$\displaystyle \mu_{i}= \frac{1}{N}\ \sum_{k=1}^{N} k = \frac {N + 1}{2}$ (1)

... and variance...

$\displaystyle \sigma^{2}_{i}= \frac{1}{N}\ \sum_{k=1}^{N} k^{2} - \frac{(N+1)^{2}}{4} = \frac{5\ N^{2} + 6\ N + 1}{24}$ (2)

... so that $Z_{2}$ has mean...

$\displaystyle \mu_{2} \sim N+1$ (3)

... and standard deviation...

$\displaystyle \sigma_{2} \sim \sqrt {(\frac{5\ N^{2} + 6\ N + 1}{6n})}$ (4)

In the previous post we found that $Z_{1}$ has mean...

$\displaystyle \mu_{1} = N$ (3)

... and standard deviation...

$\displaystyle \sigma_{1} \sim N\ \sqrt {\frac{1}{n\ (n+2)}}$ (4)

In order to extablish which is the 'better estimator' we define a sort of 'quality factor' defined as $\displaystyle \alpha= \frac{\sigma}{\mu}$ and obtain for $Z_{1}$...

$\displaystyle \alpha_{1} \sim \sqrt {\frac{1}{n\ (n+2)}}$ (5)

... and for $Z_{2}$...

$\displaystyle \alpha_{2} \sim \sqrt {\frac{5}{6\ n}}$ (6)

The conclusion is: $Z_{1}$ is the better estimator...

Kind regards

$\chi$ $\sigma$

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Posted on www.artofproblemsolving.com on 19/27/2012 by the user newsum and not yet solved…

Let X and Y have bivariate normal distribution function with parameters $\mu_{1}=3$, $\mu_{2}= 1$, $\sigma_{1}^{2}= 16$, $\sigma_{2}^{2}= 25$ and $\rho=.6$. Determine…

a) $\displaystyle P\{ 3 < Y < 8 \}$

b) $\displaystyle P\{ 3 < Y < 8 | X < 7 \}$

c) $\displaystyle P\{ -3 < Y < 3 \}$

d) $\displaystyle P\{ -3 < Y < 3| Y = -4 \}$

Kind regards

$\chi$ $\sigma$

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Posted the 12 08 2012 on www.artofproblemsolving.com by the user inakamono and not yet solved…

Find the probability that among 10000 random digits the digit 7 appears not more than 968 times…

Kind regards

$\chi$ $\sigma$

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Posted the 12 08 2012 on www.artofproblemsolving.com by the user inakamono and not yet solved…

Find the probability that among 10000 random digits the digit 7 appears not more than 968 times…
That is a classical example of cumulative binomial distribution... the probability of k events in n trials is...

$\displaystyle P_{n,k}= \binom {n}{k}\ p^{k}\ (1-p)^{n-k}$ (1)

... so that the requested probability is...

$\displaystyle P = \sum_{k=0}^{968} P_{n,k}$ (2)

... with $p=.1$ and $n=10000$. The direct computation of (2) of course requires a computer tool like...

Binomial Calculator

... that gives $= .1467...$ . Alternatively we can approximate the (1) with $\displaystyle P_{n,k} \sim N (\mu, \sigma^{2})$ where...

$\displaystyle \mu= n\ p\ ,\ \sigma^{2}= n\ p\ (1-p)$ (3)

... so that the requested probability is...

$\displaystyle P \sim \frac{1}{2}\ \{1 + \text{erf} (\frac {968 - \mu}{\sigma\ \sqrt{2}})\}$ (4)

Also in this case You need a computer tool fot the computation of (4)... 'Monster Wolfram' gives $P \sim .143061$...

Kind regards

$\chi$ $\sigma$

#### CaptainBlack

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

That is a classical example of cumulative binomial distribution... the probability of k events in n trials is...

$\displaystyle P_{n,k}= \binom {n}{k}\ p^{k}\ (1-p)^{n-k}$ (1)

... so that the requested probability is...

$\displaystyle P = \sum_{k=0}^{968} P_{n,k}$ (2)

... with $p=.1$ and $n=10000$. The direct computation of (2) of course requires a computer tool like...

Binomial Calculator

... that gives $= .1467...$ . Alternatively we can approximate the (1) with $\displaystyle P_{n,k} \sim N (\mu, \sigma^{2})$ where...

$\displaystyle \mu= n\ p\ ,\ \sigma^{2}= n\ p\ (1-p)$ (3)

... so that the requested probability is...

$\displaystyle P \sim \frac{1}{2}\ \{1 + \text{erf} (\frac {968 - \mu}{\sigma\ \sqrt{2}})\}$ (4)

Also in this case You need a computer tool fot the computation of (4)... 'Monster Wolfram' gives $P \sim .143061$...

Kind regards

$\chi$ $\sigma$
In the normal approximation you have not used the continuity correction. The 968 should be replaced by 968.5, when the probability becomes ~=0.1469.

And you don't need a computer to evaluate it, tables are quite adequate.

CB

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

... and you don't need a computer to evaluate it, tables are quite adequate...

CB
Unfortunately the personal experience of more that thirty five years in the area of telecommunications doesn't agree with this point of view. In the 'Bible' of Abramowitz and Stegun...

Abramowitz and Stegun: Handbook of Mathematical Functions

... the table of the normalized integral...

$\displaystyle erf(x)= \frac{1}{\sqrt{2\ \pi}}\ \int_{- \infty}^{x} e^{- \frac{t^{2}}{2}}\ dt$ (1)

... arrives till to x=5 and supplies the value $\text{erf} (x) \sim .9999997133 \implies \text{erfc} (x) \sim 2.867 10^{-7}$. Well!... in digital transmission a standard bit error rate not greater that $10^{-6}$ is required, and that means that, in order to have necessary 'system margin', a target of bit error rate of $10^{-8} - 10^{-9}$ is often required... and even less in the case of optical fibre link...

At this point it is clear that the use of tables was for me not adequate, so that a lot of years ago I composed, with 'patient' application of the Simpson rule, the following 'little but accurate table' of the function $\log_{10} \text{erfc} (x)$, where 'erfc(x)' is defined as ...

$\displaystyle \text{erfc} (x) = 1 - \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{- t^{2}}\ dt$ (2)

May be that, sooner or later, in a dedicate post, I will better explain the 'little accurate table' and indicate an easy way to transform it in a 'little computer program'...

Kind regards

$\chi$ $\sigma$

#### CaptainBlack

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Unfortunately the personal experience of more that thirty five years in the area of telecommunications doesn't agree with this point of view. In the 'Bible' of Abramowitz and Stegun...

Abramowitz and Stegun: Handbook of Mathematical Functions

... the table of the normalized integral...

$\displaystyle erf(x)= \frac{1}{\sqrt{2\ \pi}}\ \int_{- \infty}^{x} e^{- \frac{t^{2}}{2}}\ dt$ (1)

... arrives till to x=5 and supplies the value $\text{erf} (x) \sim .9999997133 \implies \text{erfc} (x) \sim 2.867 10^{-7}$. Well!... in digital transmission a standard bit error rate not greater that $10^{-6}$ is required, and that means that, in order to have necessary 'system margin', a target of bit error rate of $10^{-8} - 10^{-9}$ is often required... and even less in the case of optical fibre link...

At this point it is clear that the use of tables was for me not adequate, so that a lot of years ago I composed, with 'patient' application of the Simpson rule, the following 'little but accurate table' of the function $\log_{10} \text{erfc} (x)$, where 'erfc(x)' is defined as ...

$\displaystyle \text{erfc} (x) = 1 - \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{- t^{2}}\ dt$ (2)

View attachment 505

May be that, sooner or later, in a dedicate post, I will better explain the 'little accurate table' and indicate an easy way to transform it in a 'little computer program'...

Kind regards

$\chi$ $\sigma$
Then get a better table, mine goes to $$z=9.5$$ with a tail probability of $$\sim 10^{-21}$$, Also A&S give pretty good asymtotic representations for the extreme tails of the normal distribution (26.2.12 and following sections).

Also the suggestion of using a normal calculator may be less than useless to a student who will meet such a problem where they do not have access to calculation aides but may have an exam handbook with a table.

CB

Last edited:

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Then get a better table, mine goes to $$z=9.5$$ with a tail probability of $$\sim 10^{-21}$$...
That's not a very difficuly task if we use the formula in...

Erfc -- from Wolfram MathWorld

$\displaystyle \frac{2}{\sqrt{\pi}}\ \frac{e^{- x^{2}}}{x + \sqrt{x^{2}+2}} < \text{erfc} (x) \le \frac{2}{\sqrt{\pi}}\ \frac{e^{- x^{2}}}{x + \sqrt{x^{2}+\frac{4}{\pi}}}$ (1)

... which gives an 'upper bound' and a 'lower bound' of the function. In the figure...

... only the 'upper bound' is shown because the 'lower bound' in logaritmic scale is hard to be dinstinct from it. Pf course the only limitation in proceeding is the size of the diagram. It seems that the agreement with my old computation is good enough...

Kind regards

$\chi$ $\sigma$

Last edited:

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Posted the 12 12 2012 [the 'magic date' of the Maya's calendar!...] on www.mathhelpforum.com by the user asilvester635 and not yet solved…

While he was a prisoner of war during World War II, John Kerrich tossed a coin 10,000 times. He got 5067 heads. If the coin is perfectly balanced, the probability of a head is 0.5. Is there reason to think that Kerrich's coin was not balanced?... To answer this question use a normal distribution to estimate the probability that tossing a balanced coin 10,000 times would give a count of heads at least this far from 5000, that is, at least 5067 heads or no more than 4933 heads…

The problem is very similar to what treated in...

http://www.mathhelpboards.com/f23/unsolved-statistic-questions-other-sites-part-ii-1566/index3.html

... and the requested probability is...

$\displaystyle P \sim \text {erfc} (\frac {5067.5 - \mu}{\sigma\ \sqrt{2}})$ (1)

... where $\mu = 10000\ p = 5000$ and $\sigma= \sqrt{10000\ p\ (1-p)}= 50$ . For $x = .9546$ 'MonsterWolfram' supplies $\displaystyle \text{erfc} (x) \sim .177$, so that the Kerric's coin seems to be a little unbalanced toward head. The scope of this post however is to verify the possibility to use the approximate value of the erfc(*) described in...

http://www.mathhelpboards.com/f23/u...ther-sites-part-ii-1566/index4.html#post12076

... by the formula...

$\displaystyle \frac{2}{\sqrt{\pi}}\ \frac{e^{- x^{2}}}{x + \sqrt{x^{2}+ 2}} < \text {erfc} (x) \le \frac{2}{\sqrt{\pi}}\ \frac{e^{- x^{2}}}{x + \sqrt{x^{2}+ \frac{4}{\pi}}}$ (2)

Using a normal handset calculator for $x = .9546$ we find...

$\displaystyle .170483 < \text{erfc(.9546)} < .186478$

... and taking the aritmetic mean $\text{erfc(.9546)} \sim .1784$, a result 'good enough' obtained without using tables...

Kind regards

$\chi$ $\sigma$

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Posted on 12 15 2012 on www.artofproblemsolving.com by the member BlackMax and not yet solved...

Three points are uniformly and independently chosen inside a given circle. What is the probability that their circumcircle lies entirely within the given circle?... a C++ program suggests that the answer is most likely to be .4 ...

Kind regards

$\chi$ $\sigma$

#### chisigma

##### Well-known member
Re: Unsolved statistic questions from other sites, part II

Posted on 12 15 2012 on www.artofproblemsolving.com by the member BlackMax and not yet solved...

Three points are uniformly and independently chosen inside a given circle. What is the probability that their circumcircle lies entirely within the given circle?... a C++ program suggests that the answer is most likely to be .4 ...
Clearly a 'direct' attack to this problem is a little unconfortable so that I'll try and 'indirect' attack. Let's suppose that the circle is the unit circle and that a 'circumcentre point' can be represented by the distance r from the point [0,0], as il the figure...

If we fix the circumcentre, then the mesure of the set of possible 'random points' is the area of the 'red circle' in the figure, so that the requested probability is given by the 'simple' computation...

$\displaystyle P = \int_{0}^{1} (1-r)^{2}\ d r = \frac{1}{3}$ (1)

Honestly however I'm not 'fully certain' of my solution and some suggestion and/or comments from MHB members is wellcome...

Kind regards

chi sigma

#### Bacterius

##### Well-known member
MHB Math Helper
Re: Unsolved statistic questions from other sites, part II

Chisigma, I don't believe your answer is correct. You seem to be assuming the circumcentre has the same distribution as the three randomly selected points, it clearly doesn't follow the same distribution. Consider what happens when the three random points are almost colinear, for instance (the circumcircle grows much larger than the unit circle)

I wrote a little Python 3.2 script to try and calculate the probability of the circumcircle of three random points in the unit circle being fully contained in the unit circle:

Code:
from math import cos, sin, sqrt, pi
from random import random

''' Generates 'n' random points uniformily distributed in the unit circle. '''
def RandomPoints(n):
points = []

for t in range(n):

# Uniform polar distribution
theta = random() * 2 * pi

# Convert to cartesian

points.append({'x': x, 'y': y})

return points

''' Returns the circumcircle of three points a, b, c. '''
def Circumcircle(a, b, c):

# Translate such that vertex a is at the origin
u = {'x': b['x'] - a['x'], 'y': b['y'] - a['y']}
v = {'x': c['x'] - a['x'], 'y': c['y'] - a['y']}

# Precompute a few values
d = 2 * (u['x'] * v['y'] - u['y'] * v['x'])
e = pow(u['x'], 2) + pow(u['y'], 2)
f = pow(v['x'], 2) + pow(v['y'], 2)

# Compute the circumcentre's translated coordinates
centre = {'x': (v['y'] * e - u['y'] * f) / d,
'y': (u['x'] * f - v['x'] * e) / d}

# Compute the circumcircle's radius (note: translated)
radius = sqrt(pow(centre['x'], 2) + pow(centre['y'], 2))

# Translate the circumcentre back into the original space
centre = {'x': centre['x'] + a['x'], 'y': centre['y'] + a['y']}

''' Verifies if the given circle is fully contained by the unit circle. '''
def InUnitCircle(circle):

# Check if the given circumcircle is completely inside the unit circle
d = sqrt(pow(circle['centre']['x'], 2) + pow(circle['centre']['y'], 2))

if d > 1 + circle['radius']:
return False # do not intersect

if d <= abs(1 - circle['radius']):
return True # contained

return False # overlap

def Experiment(trials):
passed = 0

for t in range(trials):

p = RandomPoints(3)
c = Circumcircle(p[0], p[1], p[2])

if InUnitCircle(c):
passed += 1

return passed / trials

# Experiment script
for t in range(2, 10):
prob = Experiment(pow(10, t))
print("After " + str(pow(10, t)) + " trials, P ~ " + str(prob))
These are the probabilities I measured:

Code: