Why my random experiment has a log normal distribution?

In summary, The conversation discusses a simulation that randomly picks six letters and displays the order in which they are picked. The goal is to find out how many iterations it takes to get a specific order of letters. The results were found to be log normally distributed, but it was expected to be normally distributed with an average of 360. However, upon further discussion, it was determined that the actual distribution should be geometric with a probability of success of 1/360. The average number of iterations should be 360 as well. The conversation also mentions the use of a Bernoulli trial and the probability of success being related to the mean of the geometric distribution. The source for more information on this topic is suggested to be the Wikipedia article on
  • #1
musicgold
304
19
Hi,

I am confused with the results of a seemingly simple simulation that is generating a log normally distributed output. Please see the attached results file.

Simulation: I have built a Scratch program that randomly picks six letters from a group of six letters (A, B, C, D, E & E). The program displays the order in which the letters have been picked. I am interested in finding out how many iterations the program takes to get a specific order of letters (say, EEDCBA).

I repeated this experiment 100 times and I was surprised to see log normally distributed results. I was hoping to see a normal distribution with an average of 360.

Can someone please explain what is going on?

Thanks,
 

Attachments

  • names.xls
    26.5 KB · Views: 208
Physics news on Phys.org
  • #2
The actual distribution should be geometric with p=1/6^6 (i.e. counting the number of failures before a success).

I guess a smallish sample (size 100) would superficially resemble lognormal.
 
  • #3
I assume you understand, that you cannot actually get neither lognormal nor normal distribution, as they are continuous, and your r.v. is discrete.

If I understood your description, then your r.v. is just "the number of failures, before first success", where success is getting "EEDCBA" and trials are independent, right? In this case what you should get is the geometric distribution.

P.S. I can't open your excel file, so can't give you details of what it's doing wrong
 
  • #4
Thanks folks.

For those who are not able to open my excel file, I have attached a text file with my results.

If I understood your description, then your r.v. is just "the number of failures, before first success", where success is getting "EEDCBA" and trials are independent, right?
That is correct.

Also, I got the 360 as follows: Prob of getting E in the first place = 2/6, prob of getting the second E in the second place = 1/5, prob of getting D in the third place = 1/4...and so on.


probability of getting EEDCBA = 2/6 * 1/5* 1/4* 1/3 * 1/2 * 1 = 1/360
How is this number related to the distribution? Is it the mean of the distribution?

Also, can you please point to me a source where I can read more about this? I am not sure why I should get a geometric distribution.
 

Attachments

  • Data.txt
    505 bytes · Views: 305
Last edited:
  • #5
musicgold said:
Thanks folks.
How is this number related to the distribution? Is it the mean of the distribution?

Also, can you please point to me a source where I can read more about this? I am not sure why I should get a geometric distribution.

p = 1/360 is the probability of success. Look at wikipedia article on geometric distribution: it says "geometric distribution [...] is the probability distribution of the number X of Bernoulli trials needed to get one success". Bernoulli trial means a trial which can have only two outcomes: 1 or 0 (or true/false, success/failure etc).

Also, if p is the probability of success, then 1-p is probability of failure. In order to get (first) success on k-th trial (iteration), you need to fail k-1 times in a row and then have a success, thus [tex]P(X=k) = (1-p)^{k-1}p,[/tex] which is exactly the pmf of geometric distribution.

Also, geometrically distrubuted r.v. with parameter p has mean 1/p. So the average number of iterations should indeed be 360.
 
Last edited:

Related to Why my random experiment has a log normal distribution?

Why does my experiment have a log normal distribution?

There are a few potential reasons why your experiment may have a log normal distribution. One possible explanation is that your data is skewed to the right, meaning that there are more high values than low values. In this case, a log transformation may better fit the data and result in a log normal distribution. Another reason could be that your experiment involves multiplicative processes, such as growth or decay, which are known to follow a log normal distribution.

What is a log normal distribution?

A log normal distribution is a probability distribution in which the logarithm of the data follows a normal distribution. This means that the data is skewed to the right and has a long tail on the high end. The log normal distribution is commonly used to model data that involves multiplicative processes, such as growth or decay.

How is a log normal distribution different from a normal distribution?

The main difference between a log normal distribution and a normal distribution is that the former is skewed to the right, while the latter is symmetrical. Additionally, the log normal distribution has a long tail on the high end, while the normal distribution has tails that approach zero on both ends. This means that the log normal distribution is better suited for data with a wide range of values and a few extreme values.

Can my data fit both a log normal and a normal distribution?

Technically, yes, it is possible for your data to fit both a log normal and a normal distribution. However, this is unlikely and would require a very specific set of circumstances. If your data appears to fit both distributions, it is more likely that one is a better fit than the other, and you should choose the distribution that best represents your data.

How can I determine if my data follows a log normal distribution?

One way to determine if your data follows a log normal distribution is to plot it on a log scale. If the data appears to follow a straight line, then it is likely that a log normal distribution is a good fit. Additionally, there are statistical tests, such as the Shapiro-Wilk test, that can be used to determine the distribution of your data. However, it is important to keep in mind that these tests are not always accurate, and visual inspection is often the best way to determine the distribution of your data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
Replies
9
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Replies
1
Views
1K
Back
Top