- #1
mikeph
- 1,235
- 18
Say I have a model, y = f(x), and ten discrete data points to compare to this model, (x1, y1)...(x10,y10). The normal way would then be to take the residuals and square them to get a quality of fit, ie.
average residuals squared = {[f(x1) - y1]^2 + ... + [f(x10) - y10]^2}/10
I also remember being told that if this value is minimised then the model f(x) is the best estimate of the data, assuming the data contains only Gaussian noise?
Say instead my data were continuous (for whatever reason). Is it an equally rigorous idea to try to minimise the continuous sum of the residual squared? For example if my data is y = g(x), then the continuous version of the residual is
average residual squared = integral of (f(x) - g(x))^2 dx.
Does this make sense, is this the correct approach to comparing a continuous data set and a model?
Thanksedit- I can maybe put this a better way. Rather than only comparing the data to f(x) at the points where we have measured data, which seems a bit biased to me, why don't we measure it over the entire range of x, and then say "the most we can obtain from our data is that the function looks like a stepwise function with step heights equal to y1, y2,...", and then compute the residual in terms of the area between the model and the stepwise function.
average residuals squared = {[f(x1) - y1]^2 + ... + [f(x10) - y10]^2}/10
I also remember being told that if this value is minimised then the model f(x) is the best estimate of the data, assuming the data contains only Gaussian noise?
Say instead my data were continuous (for whatever reason). Is it an equally rigorous idea to try to minimise the continuous sum of the residual squared? For example if my data is y = g(x), then the continuous version of the residual is
average residual squared = integral of (f(x) - g(x))^2 dx.
Does this make sense, is this the correct approach to comparing a continuous data set and a model?
Thanksedit- I can maybe put this a better way. Rather than only comparing the data to f(x) at the points where we have measured data, which seems a bit biased to me, why don't we measure it over the entire range of x, and then say "the most we can obtain from our data is that the function looks like a stepwise function with step heights equal to y1, y2,...", and then compute the residual in terms of the area between the model and the stepwise function.
Last edited: