Always positive function with regression

In summary, your problem is that you need a regression of the machining cutting force for several values of cutting parameters (cutting speed, depth of cut, ...). The cutting force has to be always positive, but with your limited set of parameters (i don't have ALL the possibile speeds or all the possible depths) the resultant function for the force isn't always positive.
  • #1
mignu
1
0
I'm attempting to solve a multiple regression.
My problem is that I want the resultant function to be always positive.

I need a regression of the machining cutting force for several values of cutting parameters (cutting speed, depth of cut, ...).
The cutting force has to be always positive, but with my limited set of parameters (i don't have ALL the possibile speeds or all the possible depths) the resultant function for the force isn't always positive.

For exaple: in my experiments I have the tool diameter varying from 15 to 70. If I use my regression and try to calculate the cutting force, for example, with diameter 10 I get a negative value. That's unacceptable, the cutting force has to be positive.

I need to constrain the regression so that the function is always positive with the coefficients found and for ANY value of cutting parameters (or at least for positive values of cutting parameters, cause negative values of diameter, depth of cut, ... don't exist too). Some coefficients has to be positive and others negative, but the final function has to be always positive for positive cutting parameters values.

How can I do this? I use Matlab but any program would be great if it solves my problem.
 
Physics news on Phys.org
  • #2
Putting aside your empirical question for a moment; what does theory tell you should happen to cutting force as diameter goes to zero? Should it also go to zero, or go to some other value, or become infinite?
 
  • #3
"For exaple: in my experiments I have the tool diameter varying from 15 to 70. If I use my regression and try to calculate the cutting force, for example, with diameter 10 I get a negative value."

You also neglected a basic idea of regression: technically, your regression equation is valid only for the collected x-values: without data at x = 10, you have no basis for using the equation.
 
  • #4
Certainly the prediction confidence interval expands very rapidly as one moves beyond the limits of observed data.
 
  • #5
EnumaElish said:
Certainly the prediction confidence interval expands very rapidly as one moves beyond the limits of observed data.

it can be a little more than that. back in the "old days" of photography, when film was used, the response curves for exposure were important to know. the curves were incredibly linear in the middle (for typical exposure times), but for extremely long or extremely short exposures there was a huge departure from a linear pattern - plateaus to the right and left, basically. estimations for exposure based on the linear portion, but aimed at the extremes, were doomed to fail - you needed data from those plateaus to know the "correct" exposure values.

the behavior of a regression model is impossible to determine outside the range of collected data. the software will let you do it, but it doesn't mean the results are meaningful.
 
  • #6
Your post nicely highlights that there is more than one issue related to the current problem.

If "theory" (or common sense) tells me that the true model is non-linear, then I know that linear regression is at best a local approximation, which is your point.

But suppose that I had a theory that told me exactly what the true (linear or nonlinear) model is, for example, y = b1/x + b2 x + b3 x^2 (or: [insert complicated, nonlinear formula]). Then I could fit _this_ model to data, yet even then I need to worry about predicting outside of the range, because the prediction interval inflates rather rapidly.
 
Last edited:
  • #7
Ah, if I understand this point, I would say there is no need for estimation after the determinationof the parameters. I make this comment viewing your hypothetical model as deterministic. If it isn't meant to be, I apologize for being thick tonight.
 
  • #8
That was not the point I was trying to make, but you needn't have apologized. My post was far from clear. Here's a hypothetical example: y measures likelihood of death within 5 years, x is the cholesterol level. And z is defined as z = -Log(1/y -1), or the "logit value." Since z = -Log(1/y -1), I can also write y = 1/(1 + Exp(z)).

y , x , z
0.990872864 , 9 , 4.687334231
0.943313766 , 8 , 2.811867565
0.644459353 , 7 , 0.594772174
0.785788908 , 7 , 1.299726253
0.719638278 , 3 , 0.94266806
0.988224114 , 9 , 4.429855602
0.54455151 , 5 , 0.178679914
0.977860268 , 7 , 3.78799294
0.766413964 , 4 , 1.188171972
0.987292192 , 8 , 4.352749407
0.672823862 , 4 , 0.720984899
0.836917534 , 4 , 1.635469543
0.988351439 , 7 , 4.440855722
0.982624412 , 7 , 4.035160735
0.867312958 , 5 , 1.877406589
0.849853063 , 5 , 1.733449073
0.839829058 , 6 , 1.656956736
0.96094483 , 7 , 3.202941748
0.695135331 , 5 , 0.824238576
0.779657846 , 6 , 1.263673583

Suppose that the true model is logit: z = a0 + a1 x + u. Alternatively, I can be "naive" and estimate a linear probability model, as y = b0 + b1 x + v. My estimation results, rounded to two decimals, are:

Logit: z = -1.81 + 0.67x (F stat = 23.97)
Linear: y = 0.52 + 0.05x (F stat = 13.74)

Now suppose I'd like to predict the likelihood of death Y(x) for two out-of-sample x values, x = 11, and x = 0. The logit model gives me predicted probabilities Y(11) = 0.9960 and Y(0) = 0.1403 respectively. These are reasonable probability values. Using the linear model, however, I obtain Y(11) = 1.0939 and Y(0) = 0.5206, and I calculate the 2-standard-deviation prediction interval around Y(0) as (0.2645, 0.7766) in the linear model. To summarize the prediction results:

Logit: z = -1.81 + 0.67x
0.99598333 , 11 , 5.513277315
0.140328661 , 0 , -1.812562899

Linear: y = 0.52 + 0.05x
1.093857079 , 11
0.520578354 , 0
Prediction interval around 0.520578354 = (0.264523738, 0.776632969)

So there are at least two problems with out-of-sample prediction using the linear model. First, it can produce a "probability" value > 1. (This is similar to the problem in the OP.) But second, even when it doesn't, the prediction interval can become quite large.

The solution for the first problem is to estimate the "true model" (in this case, the logit model). There isn't a solution for the second problem, other than exercising caution when predicting out of the sample.

Even when the linear model predicts a probability value within the [0,1] interval, it doesn't mean it's the right (or approximately right) predicted value. That's apparent from the fact that the predicted value for x = 0 from the logit model, 0.1403, lies outside of the prediction interval (0.2645, 0.7766) for x= 0 in the linear model. I think this was statdad's point, one should be extra careful about extending a "simplistic" model beyond the sample because it is (at best) a local approximation. (The remedy is to estimate the true model.)
 
Last edited:
  • #9
"(The remedy is to estimate the true model.)"

Agreed - if you know the form of the true model and that experience, or solid theory, show that it extends beyond the range of your collected data. Even if it does, as you have pointed out, the confidence bands can be so wide as to be useless in practice.
 
  • #10
The expression "y = 1/(1 + Exp(z))" in my previous post should have been "y = 1/(1 + Exp(-z))."
 

Related to Always positive function with regression

1. What is an "always positive function" in regression?

An always positive function in regression refers to a mathematical function that produces only positive values as output. This means that no matter what input values are used, the resulting output value will always be positive. In regression analysis, this type of function is often used to model relationships between variables that are expected to have a positive correlation.

2. Why is it important to have an always positive function in regression?

Having an always positive function in regression is important because it allows for more accurate and meaningful interpretations of the data. For example, if the relationship between two variables is expected to be positive, using a function that can produce negative values would not accurately represent this relationship. Additionally, it can also help to avoid potential errors or issues with the model.

3. What are some examples of always positive functions used in regression?

Some examples of always positive functions used in regression include the exponential function, logarithmic function, and power function. These functions all have the property that their output values are always positive, regardless of the input values used. Other commonly used functions in regression, such as linear or quadratic functions, may also be transformed to ensure that their output is always positive.

4. How do you ensure that a regression model uses an always positive function?

To ensure that a regression model uses an always positive function, you can use mathematical transformations on the variables or the model itself. This can include taking the logarithm or square root of the variables, or using a different type of regression model, such as a generalized linear model. It is important to carefully consider the relationship between the variables and the expected output before deciding on a specific function or transformation.

5. Are there any limitations or drawbacks to using an always positive function in regression?

While using an always positive function in regression can be beneficial in many cases, there may be some limitations or drawbacks to consider. For example, if the relationship between the variables is not truly positive, using an always positive function may result in a less accurate or meaningful model. Additionally, some always positive functions may have a limited range of possible values, which could affect the flexibility or accuracy of the model. It is important to carefully evaluate the data and consider the potential limitations when choosing an always positive function for regression.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
670
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
797
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top