Variance and goodness of fit tests

In summary: For example, the chi-squared statistic can be used to test if the observed values are significantly different from the expected values.
  • #1
bioman
11
0
I'm trying to see how well my data fit a certain probability distribution (an exponential distribution) and I basically want to know how reliable is it to compare the the theoretical variance of the distribution and the variance of the data, to assess the goodness of fit of data to a distribution.

For example, when I plot a histogram of the data and overlay the theoretical distribution there is an extremely good fit, and this good fit is verified by a very high (~0.95) non-linear regression coefficent.

The odd thing is though when I compute the variance of the data, it is completely different to the variance of the theoretical distribution, almost double it all the time. Should this be happening, seeming as I get a very good fit with the histogram and regression??

It's just I have a very large sample size, ~10,000, so I taught if everything else fits well then the variance of the data should match the distribution??
So basically how reliable is the variance?
 
Physics news on Phys.org
  • #2
What is the model you are estimating? Are you estimating Pr(x < z) = 1 - exp(-λz) as a function of z, then test λ = 1 (or test λ = λ*)? If not, why not?
 
  • #3
Ok, perhaps I wasn't too clear.
My data comes from a model, which says in theory that data should follow an exponential distribution with mean [tex]\mu[/tex]. So I'm simply just trying to assess goodness of fit of the data to an exponential distribution with mean [tex]\mu[/tex]. Plotting the data (histogram) and exponential distribution together gives a very good fit (also very high regression coefficent), so I was presuming that the variance of the data should follow the variance described by the exponential distribution ie. [tex]\mu[/tex]^2, but it doesn't, and is almost double the 'predicted' variance most of the time, so I was wondering whether this is normal?? ie. should I expect the variance in the data to equal the predicted variance from the exponential distribution, seeming as the graphs give a very good fit?
 
  • #4
My guess is that your data have an error component and it is inflating the variance. Not knowing anything else, I'll call it the measurement error.

Suppose I am going to draw 4 values from some distribution. The expected values of my draws (e.g. the order statistics) are x(i) = -2, -1, 1, 2. The realized values have a random component r, driven by the underlying theoretical distribution. The realized values also have a measurement error ε, so y*(i) = y(i) + ε(i) = x(i) + r(i) + ε(i). Suppose the realizations are y*(i) = -2.18, -1.88, 1.54, 1.65. The correlation between x and y* is 0.96, so you might say that there is a "good fit," but var(y*) = 4.4 vs. var(x) = 3.3.

In the absence of a measurement error, suppose y(i) = -2.09, -1.44, 1.27, 1.83 (which values are "unobservable" to mere humans, but the probabilistic creatures who hang out in this forum can see them :smile:). Then Corr(x,y) = 0.99, so the fit is somewhat better; more importantly var(y) = 3.78, which is less than var(y*) and much closer to var(x).

You may want to look at other tests for goodness of fit.
 
Last edited:

Related to Variance and goodness of fit tests

1. What is variance?

Variance is a measure of how spread out a set of data is. It specifically measures how much the data points deviate from the mean or average value. A higher variance indicates that the data points are more spread out, while a lower variance indicates that the data points are closer together.

2. How is variance calculated?

Variance is calculated by taking the sum of the squared differences between each data point and the mean, and then dividing that sum by the total number of data points.

3. What is a goodness of fit test?

A goodness of fit test is a statistical test that is used to determine how well a set of data fits a particular distribution or model. It compares the observed data to the expected data based on the chosen distribution or model, and provides a measure of how closely the two match.

4. Why is a goodness of fit test important?

A goodness of fit test is important because it allows us to determine if a particular distribution or model is a good fit for our data. This can help us make more accurate predictions and draw meaningful conclusions from our data.

5. What are some common examples of goodness of fit tests?

Some common examples of goodness of fit tests include the chi-square test, Kolmogorov-Smirnov test, and Anderson-Darling test. These tests can be used to assess how well a set of data fits a specific distribution, such as a normal distribution or a Poisson distribution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
633
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
674
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
Back
Top