- #1
Mohammad
- 4
- 0
Hi everyone,
I have been recently intrigued by a seemingly simple problem: How to compare the averages of two groups with different sizes.
For example: Suppose you have a driver A who wins 100 out of 200 races, and a driver B who wins 1 out 2 races. It is clear that although the average is the same, driver A's achievement is less likely to occur (so it can be considered more valuable?).
I worked out a solution based on the Binomial distribution with the MLE for each driver as the parameter.
Pr(X = 100|1/2) = 0.0563 (N = 200)
Pr(X = 1|1/2) = 0.5 (N = 2)
The results matches my expectation as it indicates that the first event is less likely to occur. The problem however comes when I have a situation like this:
Driver A wins 65 out of 161 races.
Driver B wins 68 out of 244 races.
By evaluating the probabilities in the same way I got:
Pr(X = 65|65/161) = 0.0640
Pr(X = 68|68/244) = 0.0569
Intuitively, I reject this result because it is clear that driver A did a better job (because both drivers won almost the same number of races). I know it is probably because of the parameter I am using, but I don't know how to fix it.
Any thoughts?
I have been recently intrigued by a seemingly simple problem: How to compare the averages of two groups with different sizes.
For example: Suppose you have a driver A who wins 100 out of 200 races, and a driver B who wins 1 out 2 races. It is clear that although the average is the same, driver A's achievement is less likely to occur (so it can be considered more valuable?).
I worked out a solution based on the Binomial distribution with the MLE for each driver as the parameter.
Pr(X = 100|1/2) = 0.0563 (N = 200)
Pr(X = 1|1/2) = 0.5 (N = 2)
The results matches my expectation as it indicates that the first event is less likely to occur. The problem however comes when I have a situation like this:
Driver A wins 65 out of 161 races.
Driver B wins 68 out of 244 races.
By evaluating the probabilities in the same way I got:
Pr(X = 65|65/161) = 0.0640
Pr(X = 68|68/244) = 0.0569
Intuitively, I reject this result because it is clear that driver A did a better job (because both drivers won almost the same number of races). I know it is probably because of the parameter I am using, but I don't know how to fix it.
Any thoughts?