Comparing discrete data to a continuous model (1D)

In summary, the best way to fit a model to data depends on how one defines "best" and what assumptions are made about the data. Some common ways to define "best" include minimizing the sum of squared errors, minimizing the integral of the squared difference, or minimizing the squared residuals between a fitted distribution and the data. Each method has its own set of assumptions and goals that may be more suitable for certain types of data. Therefore, it is important to carefully consider the problem and the data in order to determine the most appropriate method for fitting a model.
  • #1
mikeph
1,235
18
Say I have a model, y = f(x), and ten discrete data points to compare to this model, (x1, y1)...(x10,y10). The normal way would then be to take the residuals and square them to get a quality of fit, ie.

average residuals squared = {[f(x1) - y1]^2 + ... + [f(x10) - y10]^2}/10

I also remember being told that if this value is minimised then the model f(x) is the best estimate of the data, assuming the data contains only Gaussian noise?

Say instead my data were continuous (for whatever reason). Is it an equally rigorous idea to try to minimise the continuous sum of the residual squared? For example if my data is y = g(x), then the continuous version of the residual is

average residual squared = integral of (f(x) - g(x))^2 dx.

Does this make sense, is this the correct approach to comparing a continuous data set and a model?

Thanksedit- I can maybe put this a better way. Rather than only comparing the data to f(x) at the points where we have measured data, which seems a bit biased to me, why don't we measure it over the entire range of x, and then say "the most we can obtain from our data is that the function looks like a stepwise function with step heights equal to y1, y2,...", and then compute the residual in terms of the area between the model and the stepwise function.
 
Last edited:
Physics news on Phys.org
  • #2
Asking what the "best" way to fit a model to data is like asking for the best color to paint a room. It isn't a mathematical question unless you precisely define what "best" means to you.

If you precisely define the meaning of "best". then you need a lot of information (or a lot of assumptions) to solve the problem. Otherwise, finding the best way is as futile as tyring to find missing sides and angles of triangle when all you know is one side and one angle.

People often define "best" fit to mean a model that minimizes the sum of the squares of the "errors" or "residuals" between the fitting equation and the data. In the continuous case, some people befine "best" to mean a fit that minimizes the integral of the square of the difference between the fit and a continuous version fo the data.

In the case where the data is data assumed to come from a probablity distribution, people sometimes define the "best" fit to be the one that minimizes the sum of the squared residuals between the fitted cumulative distribution and the cumulative distribution of the data. This is the method that you proposed in your Edit.

The above facts are facts about human behavior and culture, not mathematical theorems. People have written mathematical articles about why least squares turns out to be a good way of defining "best" in real world problems. These articles argue that particular goals and particular assumptions are reasonable models for many real world problems and they show that least squares fitting is best according to those goals and assumptions.
 

Related to Comparing discrete data to a continuous model (1D)

1. What is the difference between discrete data and a continuous model?

Discrete data refers to information that can only take on specific, separate values, while a continuous model represents a range of possible values. In other words, discrete data is counted or measured in whole units, while a continuous model is represented by a smooth, unbroken line.

2. How do you compare discrete data to a continuous model?

To compare discrete data to a continuous model, you would first plot the discrete data points on a graph. Then, you can overlay the continuous model on the same graph and visually compare the two. You can also use statistical methods such as regression analysis to determine the relationship between the two sets of data.

3. What are some examples of discrete data and continuous models?

Examples of discrete data include the number of students in a classroom, the number of cars sold in a month, or the number of pets in a household. Examples of continuous models include the height of a person, the temperature in a room, or the speed of a moving object.

4. Can you use a continuous model to represent discrete data?

Yes, a continuous model can be used to represent discrete data by approximating the data points with a smooth curve. However, this may not accurately reflect the exact values of the discrete data points, so it is important to consider the limitations of using a continuous model with discrete data.

5. What are the advantages and disadvantages of using a continuous model to represent discrete data?

The advantage of using a continuous model is that it allows for easier visualization and analysis of the data. However, the disadvantage is that it may not accurately represent the specific values of the discrete data points, and may lead to errors in interpretation. Additionally, a continuous model may not be appropriate for discrete data sets with a small number of data points.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
666
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
914
  • Precalculus Mathematics Homework Help
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Electrical Engineering
Replies
4
Views
529
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Cosmology
Replies
1
Views
991
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top