Difference between MAPE and SSE

maistral · Feb 20, 2021

I am working on an equation that is supposed to model two dependent variables Y and Z using four parameters a, b, c, and d (for regression) and a single independent variable X. What I am doing is that given a set of values for X, I am going to regress a, b, c, and d to fit Y_calc and Z_calc to Y_expt'l and Z_expt'l.

My problem is this: I tried using both MAPE, and SSE normalized via the standard deviation of each dependent variable as objective functions:

MAPE = 100/n_X Σ[|Y_{i, calc} - Y_{i, expt'l}| / Y_{i, expt'l}] + 100/n_X Σ[|Z_{i, calc} - Z_{i, expt'l}| / Z_{i, expt'l}]

SSE = Σ{[(Y_{i, calc} - Y_{i, expt'l}) / σ_{Y_calc}]²} + Σ{[(Z_{i, calc} - Z_{i, expt'l}) / σ_{Z_calc}]²}

All summations are from i = 1 to n_X.

My issue is as follows: It always (at least, to my requirement) ends up with MAPE doing a better job of determining the parameters a, b, c, and d in fitting Y and Z. Why is this so? May I know what is the fundamental difference between the two, and why and why not should I use MAPE / SSE?

Stephen Tashi · Feb 20, 2021

maistral said:

I am going to regress a, b, c, and d to fit Y_calc and Z_calc to Y_expt'l and Z_expt'l.

My problem is this: I tried using both MAPE, and SSE normalized via the standard deviation of each dependent variable as objective functions:

Do you mean that you estimated a,b,c,d in two different ways? - one way was by picking values that minimized MAPE and the other way was by picking values that minimized SSE ?

My issue is as follows: It always (at least, to my requirement) ends up with MAPE doing a better job of determining the parameters a, b, c, and d in fitting Y and Z.

What criteria are you using to determine which method did a "better" job? How are you measuring the quality of the fit? Do you have a mathematical function that measures the "error" in the fit that is different from MAPE or SSE?

maistral · Feb 21, 2021

Stephen Tashi said:

Do you mean that you estimated a,b,c,d in two different ways? - one way was by picking values that minimized MAPE and the other way was by picking values that minimized SSE ?

Yup, this is what I meant. And it always ended up with the MAPE formulation doing a better job, at least, for my requirement.

Stephen Tashi said:

What criteria are you using to determine which method did a "better" job? How are you measuring the quality of the fit? Do you have a mathematical function that measures the "error" in the fit that is different from MAPE or SSE?

No, I mean I have a dataset for (Y, Z) vs. X. When I fit variables Y and Z, I find that the curve resulting from MAPE behaving more properly. My data has this certain funny behavior of exponentially behaving at low and mid ranges, then at the end of the data range, it suddenly shoots up. Also, on a lesser note, I can draw a certain linear or quadratic relations using a, b, c, and d versus other variables, which 'generalizes' them somehow.

Actually, I've decided that I'm going to use MAPE because of this. I just could not tell why does MAPE do a better job than SSE which goes back to my original question - may I know what is the mathematical tendency of MAPE as compared to SSE?

maistral · Feb 21, 2021

Upon further reading I found out that MAPE is supposed to be scale-independent, and SSE is scale-dependent. May I know what scale-dependency means? And other than these differences, are there any other fundamental difference in mathematical tendency between the two?

Stephen Tashi · Feb 21, 2021

maistral said:

No, I mean I have a dataset for (Y, Z) vs. X. When I fit variables Y and Z, I find that the curve resulting from MAPE behaving more properly. My data has this certain funny behavior of exponentially behaving at low and mid ranges, then at the end of the data range, it suddenly shoots up.

may I know what is the mathematical tendency of MAPE as compared to SSE?

If you want mathematical answers, you'll have to ask questions that are mathematically precise. The visual appeal of a curve fit is subjective and can vary from person to person. To get specific advice, I suggest you post data or some graphs. That type of question usually gets a lot of suggestions.

Whether a curve fit "looks right" is not a purely mathematical question. It involves whatever field of science applies to the data.

Upon further reading I found out that MAPE is supposed to be scale-independent, and SSE is scale-dependent.

Those are somewhat ambiguous statements. Are you thinking of "MAPE" as method of curve fitting versus thinking of it as a single number that measures how well a curve fits?

Suppose we are measuring the X data in cm and the Y data in kg. We compute that the curve that minimizes the mean absolute percentage error. The values of the parameters of the curve are ##a_1,b_1,c_1,d_1##. Then we express the Y data in grams ( thus changing the scale of the Y data) and compute another curve that minimizes the mean absolute percentage error to the rescaled data. The parameters for that curve are ##a_2,b_2,c_2,d_2##. In general, it will turn out that ##a_1 \ne a_2, b_1 \ne b_2, c_1 \ne c_2, d_1 \ne d_2##. Considering "MAPE" to be a method of curve fitting the values obtained for the parameters are not scale independent. However, the mean absolute percentage errors produced by the two curves are identical. So, considering MAPE to be a single number measuring how well a family of curves fits, it's value is scale independent.

maistral · Feb 22, 2021

Hi, and thanks for replying.

Stephen Tashi said:

If you want mathematical answers, you'll have to ask questions that are mathematically precise. The visual appeal of a curve fit is subjective and can vary from person to person. To get specific advice, I suggest you post data or some graphs. That type of question usually gets a lot of suggestions.

Actually I was referring to how the MAPE reduces errors compared to SSE. Because I find it weird that they have the same numerators and yet they converge in an entirely different manner.

Stephen Tashi said:

Whether a curve fit "looks right" is not a purely mathematical question. It involves whatever field of science applies to the data.

Actually I could vouch for this, I can say that the SSE does not give a curve that "looks right", and that MAPE does so. Did this come from a book, or does it come with your experience? I think I would need to write this argument.

For reference, these are my results for A vs. a parameter where the values for A are generalized via the parameter:

This one's from SSE

And this one's from MAPE

Stephen Tashi · Feb 22, 2021

maistral said:

For reference, these are my results for A vs. a parameter where the values for A are generalized via the parameter:

I don't know what you mean by "the values for A are generalized via the parameter".

Relevant to difference between MAPE and SSE fitting would be graphs of the same data and two curves fit to it, one minimizing MAPE and one minimizing SSE. The two graphs you show look like they plot different data.

maistral · Feb 23, 2021

Stephen Tashi said:

I don't know what you mean by "the values for A are generalized via the parameter".

Relevant to difference between MAPE and SSE fitting would be graphs of the same data and two curves fit to it, one minimizing MAPE and one minimizing SSE. The two graphs you show look like they plot different data.

They're supposed to be different.

It's like this, I fit A, B, C, and D on a Y, Z vs. X equation. Then I get different values of A, B, C, and D for each set of Y, Z vs. X.

Then if I generalize A, B, C, and D by plotting it against a certain parameter (which is known to our field to work), those graphs appear. MAPE shows a generalization of A better than SSE.

Stephen Tashi · Feb 23, 2021

maistral said:

It's like this, I fit A, B, C, and D on a Y, Z vs. X equation. Then I get different values of A, B, C, and D for each set of Y, Z vs. X.

Then if I generalize A, B, C, and D by plotting it against a certain parameter (which is known to our field to work), those graphs appear. MAPE shows a generalization of A better than SSE.

So this is a more complicated scenario that simply fitting one curve to one set of data. However, we should start with looking at how the curve fits for individual sets of Y,Z vs X data look.

It's not clear what you mean by "generalizing A". Are you saying that you have a theoretical equation that predicts "A" as a function of the "certain parameter"? Does each set of Y,Z vs X data correspond to a single valule of that certain parameter?

I assume you understand how "percentage error" differs qualitatively from "squared error" - for example, that (110-100)/100 is the same percentage error as (1.1 - 1.0)/1.0 but that (110-100)^2 is a larger squared error than (1.1- 1.0)^2.

maistral · Feb 25, 2021

Hi. I was able to replicate my problem in a more simple manner. I hope this would bring in more insight.

So what I did was generate data from y = exp(x) + rand(). Then I tried fitting a quadratic function on the dataset twice, the first one (orange) using the SSE formulation (minimizing the sum of the squares of the residuals) and the second one using the MAPE formulation (minimizing the average absolute percentage error). The results are as follows:

This is what I 'mean' before. Why is it that the MAPE and SSE curves (in turn, the coefficients of the fitting quadratic polynomial) not the same? While I know the objective functions are obviously not the same, what are the differences between MAPE and SSE in handling errors such that it ended up this way?

For my study, MAPE seems to work better - and it was the one being used by almost all researches in my field, but IMO it's so brain-dead to use something because everyone is using it and that no one knows the concept behind it. I still cannot differentiate why does it work better compared to SSE. That's why I was asking how SSE or MAPE handles errors. Like, does the coefficient n in MAPE have any bearing? Or something?

EDIT: Updating the spreadsheet file to include the following:

So I tested what if I only had a single measurement for a certain point, and now MAPE works better. SSE works better at the larger x-values though:

May I know why is this happening?

Stephen Tashi · Feb 25, 2021

maistral said:

This is what I 'mean' before. Why is it that the MAPE and SSE curves (in turn, the coefficients of the fitting quadratic polynomial) not the same?

I can't understand why you would expect the fits to be the same curve. If there happened to be one curve that produced zero error at all points then I can understand why MAPE and SSE would both result in that curve. Otherwise, why should they produce the same result?

While I know the objective functions are obviously not the same, what are the differences between MAPE and SSE in handling errors such that it ended up this way?

Let's take a simple case. Suppose there are 2 data points, ## y = 10, y = 100## and you want to find a single number to approximate those two values.

The value ##a## that minimizes the MSE = ##(1/2)( (a-10)^2 + (a-100)^2) ## is ##a = (10 + 100)/2 = 55##.

The value ##a## minimizes MAPE = ## (1/2)( |a-10|/10 + | a-100|/100) ## is ##a = 10##

In manner of speaking, minimizing MAPE doesn't consider the error between 100 and ##a=10## to be large, but SSE does.

For my study, MAPE seems to work better - and it was the one being used by almost all researches in my field, but IMO it's so brain-dead to use something because everyone is using it and that no one knows the concept behind it.

Without the specifics of the problem, I can only speculate why MAPE would work better. Suppose we are trying to fit a a theoretical curve ##A = g(y)## to data and we must do this by first fitting ##y = f(x)## to some data for ##x##. The actual function ##g(y)## might be something like ##g(y) = y + 1## where an error in the value of ##g(y)## when the error in ##y## is ##\delta y = 10## is the same error regardless of whether we are dealing with ##y = 10## or ##y = 100##. On the other hand, the function ##g(y)## might be something like ##g(y) = (y + 1)/y ## where an error of ##\delta y = 10## in ##y## produces different errors in ##g(y)## depending on whether ##y = 10## or ##y = 100##. The first case suggests fitting ##f(x)## to data using SSE. The second case suggests fitting ##f(x)## to data using MAPE.

maistral · Feb 25, 2021

Stephen Tashi said:

Let's take a simple case. Suppose there are 2 data points, ## y = 10, y = 100## and you want to find a single number to approximate those two values.

The value ##a## that minimizes the MSE = ##(1/2)( (a-10)^2 + (a-100)^2) ## is ##a = (10 + 100)/2 = 55##.

The value ##a## minimizes MAPE = ## (1/2)( |a-10|/10 + | a-100|/100) ## is ##a = 10##

In manner of speaking, minimizing MAPE doesn't consider the error between 100 and ##a=10## to be large, but SSE does.

Oh. This is what I meant about how they are handling errors. Thank you very much for this.

Stephen Tashi said:

Without the specifics of the problem, I can only speculate why MAPE would work better. Suppose we are trying to fit a a theoretical curve ##A = g(y)## to data and we must do this by first fitting ##y = f(x)## to some data for ##x##. The actual function ##g(y)## might be something like ##g(y) = y + 1## where an error in the value of ##g(y)## when the error in ##y## is ##\delta y = 10## is the same error regardless of whether we are dealing with ##y = 10## or ##y = 100##. On the other hand, the function ##g(y)## might be something like ##g(y) = (y + 1)/y ## where an error of ##\delta y = 10## in ##y## produces different errors in ##g(y)## depending on whether ##y = 10## or ##y = 100##. The first case suggests fitting ##f(x)## to data using SSE. The second case suggests fitting ##f(x)## to data using MAPE.

And this is the kind of argument I need for my study, lol. However I cannot... understand a few things.

As far as I understood from your example, if I have, say, an equation g(y) and I have no prior knowledge of this function except from the data generated from g(y), plus some errors in y that are somehow consistent in quantity with all the other values of y, I should use SSE?

And that if, say, for example, I have g(y), and again, I have no prior knowledge of this function except again the data generated from this function, plus some errors that vary widely depending on the value of y, I should use MAPE?

Difference between MAPE and SSE

Attachments

Related to Difference between MAPE and SSE

1. What is MAPE and SSE?

2. What is the difference between MAPE and SSE?

3. Which metric is better to use, MAPE or SSE?

4. Can MAPE and SSE be used together?

5. How do you interpret MAPE and SSE values?

Similar threads

Hot Threads

Recent Insights