Help w/formula for data points

In summary, the speaker is looking for a simple formula to calculate temperature based on data received from an ADC. They have measured and recorded the corresponding values for temperature and ADC and are looking for a formula that can accurately interpolate between the measured points. They are also considering using a higher order fit or averaging the data to improve accuracy. The speaker suggests using a scaled equation with integer math to fit within the limitations of the microprocessor.
  • #1
RexLan
9
0
I am making a gauge using a microprocessor. I have several sensors and will post one now. It is basically a resistor and its value changes with temperature.

I connect this voltage to an ADC (analog to digital converter) that the micro can read and it returns a number between 1024 and zero that corresponds to the temperature.

I need a SIMPLE formula that will let me calculate the temperature based on the number I receive from the ADC.

The micro only does 16 bit math (no floating point) so it is very difficult to work with. It has normal arithmetic and a modulus (//) operator for the division remainder.

These are the numbers for the graph as I have accurately measured them. The formula needs to use the ADC value so I can calculate the temperature.

I will post the other two after this one.

Thanks for the help.



Temp ADC
100 894
112 868
125 849
138 818
150 780
163 754
175 678
180 676
188 666
195 645
200 622
205 606
213 580
220 554
225 531
230 516
238 484
245 457
250 440
255 420
263 377
275 330
 
Last edited:
Mathematics news on Phys.org
  • #2
By far the easiest method is to just store a table of data points such as the one you have already measured and then do some kind of interpolation for intermediate points. Your data is close enough to linear that a simple linear interpolation between the given points will be very accurate.

Before you start however you might want to repeat your measurements a few times and take an average or something. If you plot your data points there are a few little kinks in it that might be measurement error. Overall the data looks like it could be approximated fairly well with a parobola if you wanted to fit a smooth curve to it and iron out the kinks.
 
  • #3
BTW. Here is a graph of your data showing a "best fit" parabola.
 

Attachments

  • Image2.gif
    Image2.gif
    4.6 KB · Views: 482
  • #4
uart said:
By far the easiest method is to just store a table of data points such as the one you have already measured and then do some kind of interpolation for intermediate points. Your data is close enough to linear that a simple linear interpolation between the given points will be very accurate.

Before you start however you might want to repeat your measurements a few times and take an average or something. If you plot your data points there are a few little kinks in it that might be measurement error. Overall the data looks like it could be approximated fairly well with a parobola if you wanted to fit a smooth curve to it and iron out the kinks.
Thank you ... Unfortunately I do not have enough storage for 3 sets of data. The running program just about has the chip maxed out so I was hoping for a formula based on the ADC value.

The graph you made isn't visible yet .. says waiting approval? I have run the test twice and the values are very close.

What I will see, however, is that with 5% resistors for the divider there will be some variation in the voltage.

Formula?
 
  • #5
When you do the analysis in Excel, would a straight-line fit to the data give you sufficient accuracy? That would then only be a couple of numbers that you would have to store for each sensor's calibration.
 
  • #6
berkeman said:
When you do the analysis in Excel, would a straight-line fit to the data give you sufficient accuracy? That would then only be a couple of numbers that you would have to store for each sensor's calibration.

Yes it would. I have two temperature sensors and one pressure sensor.

If I were within 5 degrees or so on the temperature that would be fine. A few PSI on the pressure is OK too. Here is the other two sets of data.

I make a digital tachometer already and Jeremy and I did some clever stuff with the math since we only can do 16 bit with no floating point.

That write-up is here:
http://www.tach.rex-deb.net"

Pressure data:

Psi Resistance

80 192
75
70 179
65 170
60 156
55 146
50 132
45 114
40 108
35
30 84
25 73
20 61
15 48
10 36
5 24
0 0

OIL Temp
Temp ADC

70 934
100 922
113 893
125 863
138 847
150 780
163 740
175 658
188 593
200 527
213 487
225 458
238 429
250 361
263 332
275 281
288 252
300 228
313 193
325 160
338 145
350 114
363 95
375 82
 
Last edited by a moderator:
  • #7
RexLan

Real quickly -- A quick Excel linear regression on your Temp-ADC raw data yields an expression as

Temp = 384.487 - .303707 * ADC

This regression is close, but there will be some significant error near the endpoints. As "uart" suggests, a higher order fit could improve results. Also, try to obtain better data as uart suggest by performing multiple runs (some of the data is "noisy")

To make a usefull integer equation representation, consider scaling by 2^7

Temp = (49214 - 39 * ADC) / 128

For your range of data, the calculations should fall within 16 bits.

Compute the numerator with integer math in any 16 bit register. The division of 128 can be accomplished with a shift left of 7 bits.

This will get you very close to the real math representation of the linear regression line.

More work can yield better approximation.

Hope this helps
 
  • #8
TheoMcCloskey said:
RexLan

Real quickly -- A quick Excel linear regression on your Temp-ADC raw data yields an expression as

Temp = 384.487 - .303707 * ADC

This regression is close, but there will be some significant error near the endpoints. As "uart" suggests, a higher order fit could improve results. Also, try to obtain better data as uart suggest by performing multiple runs (some of the data is "noisy")

To make a usefull integer equation representation, consider scaling by 2^7

Temp = (49214 - 39 * ADC) / 128

For your range of data, the calculations should fall within 16 bits.

Compute the numerator with integer math in any 16 bit register. The division of 128 can be accomplished with a shift left of 7 bits.

This will get you very close to the real math representation of the linear regression line.

More work can yield better approximation.

Hope this helps


YES!
This works fine and I can tweak this just a little. The results are well within acceptable limits for this application.

Can you make a similar formula for the two remaining data sets? Please?

I have not a single clue how you did that. :blushing:
 
  • #9
RexLan said:
YES!
This works fine and I can tweak this just a little. The results are well within acceptable limits for this application.

Can you make a similar formula for the two remaining data sets? Please?

I have not a single clue how you did that. :blushing:

It would be better if they explained how they did it, so that you make the formulas yourself. That way you will know how to do it in the future.
 
  • #10
berkeman said:
It would be better if they explained how they did it, so that you make the formulas yourself. That way you will know how to do it in the future.

I agree ... can you help?

BTW: the link in your sig does not work.

Rex
 
  • #11
RexLan -- for some discusisons on the mechanics & mathematics of linear regression, please search the web as there are many good references. I've included two here that you may explore directly.

http://phoenix.phys.clemson.edu/tutorials/excel/regression.html

http://en.wikipedia.org/wiki/Linear_regression

If you need additional help, please don't hesitate to ask, as I can explain in more detail. However, the on-line sources probably explain it better than I could.

I used Excel only because it was convenient. You can use the functions of excell as mentioned in the first link or use Excel's regression module from the Excel Data Analysis Pack Add-in.

I'll explore your other data -- try it yourselve and we can compare.
 
  • #12
RexLan said:
BTW: the link in your sig does not work.

Thanks for the heads-up, the site appears to be down temporarily. I'll ping the webmaster.
 
  • #13
OK this is what I have done ….

For my Oil Pressure I plotted it with resistance on the X axis and Pressure on the Y axis.
Using the linear regression best line fit I got an equation of Y=.4154(X) – 3.7542. Then I scaled this up by 10^10 and ended up with (4154X – 37542)/10,000. I then tweaked it to (4154X-37542) / 10000 - 1 and it is spot on accurate!

For my Oil Temperature I did the same thing and came up with = -(0.2834*C25) + 363.51. I also tried a fit with a polynomial and it was actually worse in the final calculations even though the line looked better.

Two problems however. My little Picaxe processor can not handle a negative number and second the fit isn’t really good enough so I am sort of stuck with this one now. I am seeing more than a 10 degree error and I would/need it to be better than this.
 
  • #14
RexLan – Very Good! – your results come close to mine, with some differences. These cases (2nd and 3d data set) are somewhat more tricky than the first. I’ll explain further in a moment.

As I said, I told you I would share my results, so here goes:

For the (oil) pressure case:

Observations: I made these observations of the data:

- Data is missing data points at psi=75 and 35. These are omitted. The total number of data points are 15.

- The data (and physical conditions) suggests that the dependent variable (psi) is strictly positive, ie, the “y” values should always be non-negative (if this is incorrect, please advise).

- A graph of the data suggests a linear relationship may be plausible although behavior near zero may suggest a higher degree fit could appropriate.

Analysis:

If we assume a Linear model: PSI = A0 + A1 * Resistance

That is, y = A0 + A1 * X + e

(For model completeness, we assume the error (e) is normally distributed with mean zero)

Then I yield the following:
A0 = -4.19205
A1 = 0.415549
R2 = 9.94279E-01
StdErr = 1.97799E+00

Visually, this model fits somewhat well, but it will predict negative values near x=0, something that is undesirable.

Another Model:

We can formulate a regression to accept a model that forces the intercept (A0) to be zero. This would help the situation near zero. We will have to propose a model with one parameter instead of two as we delete the A0 intercept. This will mean a sacrifice in some measure of fit, but this loss may be secondary to the desire of the nature of fit (ie, having the predictions range through zero).

A linear model with zero intercept would then be

Y = a1 * X + e

To construct the normal equation(s) associated with this model, we note that there is only one variable (A1) in the model.

S = SUM [(y – A1 * x)^2]

dS/dA1 = 0 ==> A1 = SUM(xy) / SUM(x^2)

Using this model on the data, the following results are obtained

A1 = 3.84529E-01
R2 = 9.96189E-01
StdErr = 1.641588347

Note that the standard error and coefficient of fit (R^2) are slightly greater for this model than the previous.

PicAxe Math:

If we want to use this model, let’s scale it to accommodate 16 bit math. Try a scale factor of 2^8 = 256:

a1 = [A1 * 256] = 98

Yc1 = (a1 * X) SHR 8 ; (98 * x ) / 256

Then (using integer math) the model produces the following predictions and associated errors

Yc1 Err_C1
73 7
68 2
65 0
59 1
55 0
50 0
43 2
41 -1
32 -2
27 -2
23 -3
18 -3
13 -3
9 -4
0 0

A somewhat better fit can be constructed with a quadratic (2nd order) model which I’ll get into, but

. . . before I go there, let me ask some questions:

I’m very familiar with the x86 processor, but I’m not totally familiar with the PicAxe. Based on your dialog, I’m assuming it supports 16 bit math. However,
- does it support 16 bit SUBtraction
- does it support 16 bit ADDtraction
- does it support 16 bit SUBtraction with CARRY
- does it support 16 bit ADDtraction with CARRY
- does it support 16 bit MULTiplication (with carry?)
- does it support 16 bit signed MULTiplication
- does it support SHIFT (LEFT& RIGHT) functions
- etc (any appropriate integer math function?)

I’m asking these questions because it bears directly on these “scaled” answers (note the factors of 2, assumming integer division by bit-shifting). I’ve been trying to keep all intermediate calculations within 16 bit word length. There are several short tricks that can be used to effect the computations of your equations if some additional features are available.

I’ll post some results for this data set and your third data set in a little while.

FYI - The third set offers some additional challenges as the data maybe somewhat “noisy” for a linear model. This suggests the need for better data(?), or possibly a higher order fit(?) – BUT – you should ask yourself “does the real world physics suggest a higher order model as being appropriate”?

More to follow
 
  • #15
Well ... I am humble and must admit that I don't understand allot of this, even though I am 60 years old! I am but a simple pilot from Alaska and now retired into racing cars with too much time on my hands. You can see what I am up to here: http://www.tach.rex-deb.net"

Yes -- Oil pressure is always positive 0-80 psi
All temperatures are always positive values.

Also, the Picaxe does not support (xxxx+xxxx / 3) * 2 type math. It performs the math right to left in the order it sees it so it is limited in this regard as well.

The Picaxe is a clever little chip with a bootstrap program on the back of a PIC processor. However, it uses a simple form of basic and can be instantly programmed using a simple free editor SW. It is also quite inexpensive.

I am very poor with it so far and have had allot of help, but I do try and I am learning. I have also done some very useful things with it too, like make an altimiter.

Also, I will need to redo the oil pressure data. I have to build a fixture to actually pressurize and then run the Picaxe with the ADC to get real numbers. I made a crude one to see if it was linear by just looking at the resistance and that is what this data set is.

Here are the math functions available.
ADC readadc 10 bits
RAM peek, poke
Serial serin, serout
Program Flow goto, gosub, return, branch
Loops for...next
Mathematics let (+, -, *, **, /, //, max, min, &, |, ^, &/, |/, ^/ )
I2C readi2c, writei2c, i2cslave
PWM pwmout

I would also like to email after we get this done and I hope it will help others as well.

Thanks

motorsports @ rblantz.com
 
Last edited by a moderator:
  • #16
RexLan - No need to be humble! -- please don't get me wrong, I am very impressed with the work of you and your colleagues as shown on your web site. I guess I'm somewhat envious as well. I'm (also) a frustrated 51 yr old engineer that is fascinated by some of the new "toys" available to us today but too bogged down with "real work issues" (for a few more years ) to avail any time & enjoyment. Anyways, I'd like to help in any areas of data analysis if I can. This is a field I enjoy doing as a hobby and I have spent a considerable amount of time (years) pursuing as an aside. Ah well...


Thanks for info on PicAxe -- I'll research it some more.

Also, the Picaxe does not support (xxxx+xxxx / 3) * 2 type math. It performs the math right to left in the order it sees it so it is limited in this regard as well.

I realize the limitation on the type of math -- but I think we can take advantage of the order of operations to achieve a desired result by carefully formulating the math sequences. For example, consider the two expressions for a 2nd degree polynomial (SCREAM at me if I am being to noobish):

Code:
y1 = a0 + a1*x + a1*(x^2)

y2 = a0 + x*(a1 + a2*x)
The expression for y2 is preferred over that for y1 and can be sequenced to allow the intended computation:

t = a2*x + a1

y2 = x*t + a0

As far as the "other math" associated with the data fitting, I can explain with much more detail. I don’t know, but maybe that would be best done offline as to not crowd the bandwidth of this forum (unless others feel differently).

The principle challenge is to see if we can do the necessary data predictions to sufficient accuracy using the limited math and data representation of the Picaxe.

I’ll post more on the second order fit (with zero intercept) for the 2nd data set and some info on the third data set in a little while.
 
  • #17
I have re-done the oil pressure data and was quite surprised at the findings. The original data was Pressure v. resistance and it was a straight line almost.

The new data, and the one I really need, is the actual ADC reading from the processor with the sender and voltage divider using a 100 ohm resistor. The sender is 0-190 ohm.

The plot of the new data is almost logarithmic or something and totally crazy. I did it twice with my precision gauge and the data is very accurate.

This is the data and the hook-up.

5 v
+
|
|
.-.
| |Sender
| |
'-'
|
| .--------.
o-------| Picaxe |
| '--------'
.-.
| |
| |100 ohm
'-'
|
|
===
GND

Psi ADC 100 ohm
85 667
80 664
75 661
70 654
65 639
60 624
55 607
50 589
45 569
40 542
35 514
30 482
25 443
20 398
15 345
10 282
 
  • #18
Wow! -- funky. If it were not for the last three points (75-85 psi), I can get a good linear transformation via Y = a / (b - X)^p. However, "p" will be close to .25 and we would be stuck with computing a couple of square roots or a 4th roots (can be done with extra code and execution time - yuck). Could split the fit into two parts? (yuck)

I'll construct the polynomial fits (Linear, quadratic, and cubic), but the quadratic and cubic maybe difficult to compute on the Picaxe due to the math limitations - let's see.
 
  • #19
OK – I ran some polynomial regressions on the latest data. I stuck to the polynomial models as they at least have a chance of being computed with the PicAxe math. Other exotic models would not likely be supported with the restrictions of integer math.

The regressions were not hard at all, but formulating the integer math sequences to emulate the real math took some time. I’ll present the real math results for the three polynomial models of order 1 (linear), 2 (quadratic), and 3 (cubic). I’ll then provide a suggestion for implemented the order 2 case with integer math.

Before performing the regression, I transformed the X variable (ADC) by its approximate median (475). That is, I let u = (x – 475) and regressed on y and u. The reason I did this will be clear in a moment.

Linear

Ao = 3.48892E+01
A1 = 1.86827E-01
R^2 = 9.10947E-01
StdErr = 7.35307E+00

Quadractic

Ao = 2.72311E+01
A1 = 1.67220E-01
A2 = 4.87571E-04
R^2 = 9.82796E-01
StdErr = 3.35387E+00


Cubic

Ao = 2.82827E+01
A1 = 1.17918E-01
A2 = 4.37581E-04
A3 = 1.77526E-06
R^2 = 9.92472E-01
StdErr = 2.30916E+00

Discussion:
Real Math Regressions:
The results of the linear case is less than desirable. The average magnitude of the residuals is 5.7 as the large standard error suggests. Percent fit is roughly 91%.

The results of the quadratic case is a little better as the R^2 value and standard error suggest. I explored this model as it had a fair chance of being handled with the Picaxe math and it was not an overly complicated case. By the time I’m done, the average magnitude of the residuals will be 2.3 and the standard error will be about 3.06 – pretty close to the real math case.

The results of the cubic case are better still, but I won’t pursue this at this time as it is difficult to model with the Picaxe math.

Picaxe suggestion.
As I said, if the linear case is still not desirable, then I suggest the quadratic case. This is a compromise as better models may exist, but they tend to be too exotic for the simple math engine of the Picaxe. With that, consider this scheme:

Code:
STEP ZERO – scale up integer representation of the coefficients.

Let K = 2^8 = 256

ao = Ao * K = 6971
a1 = A1 * K^2 = 10,959
a2 = A2 * K^2 = 32
Now consider the model depending on the value of X

Code:
If (X > 475) Then 

   u = x – 475

   y = a0 + u * ( a1 + u * a2 )

Else

   u = 475 – x

   y = a0 - u * ( a1 - u * a2 )

End If
The following steps can now be used. For now, consider the case for X > 475. "Code comments" are after the semi-colon.

Code:
STEP 1 – compute a2 * u

   AX1 = a2 * u         ; AX1 = (A2*2^16) * u

STEP 2 – Add a1

   AX2 = AX1 + a1      ; AX2 = (A2*2^16) * u + (A1*2^16)

STEP 3 – scale down slightly (by factor 2^6)

   AX3 = [AX2 / 2^6]  ; AX3 = (A2*2^10) * u + (A1*2^10)

STEP 4 – multiply by u

   AX4 = AX3 * u

STEP 5 – scale down remainder of decade (factor of 2^2)

   AX5 = [AX4 / 2^2]  ; AX5 = ((A2*2^8) * u + (A1*2^8)) * u
 
STEP 6 – add a0

   AX6 = AX5 + a0      ; AX6 = ((A2*2^8) * u + (A1*2^8)) * u + (A1*2^8)

STEP 7 – compute Yp by scaling down AX6

   Yp = [AX6 / 2^8]

FINISHED

Of course, the variables (ie, AX1, AX2, etc) can be re-used and we don’t need separate ones for each step. I only did this for clarity.

When X < 475, compute u = 475 – x, and two of the above steps need to be modified.

Code:
STEP 2 – SUBTRACT AX1 from a1
   AX2 = a1 - AX1   ; AX2 = (A1*2^16) - (A2*2^16) * u 

STEP 5 – SUBTRACT AX5 from a0

   AX6 = a0 - AX5   ; AX6 = (A1*2^8) – u * ((A1*2^8) - (A2*2^8) * u)

All the remaining steps stay the same.

These series of steps, as convoluted as they may seem, are necessary so that all integer math stays positive and within range of a 16 bit register. The end result of this approach yields the following fit:

Code:
Y    X     Yp    ErrYp
85  667    77      8
80  664    76      4
75  661    75      0
70  654    72     -2
65  639    67     -2
60  624    62     -2
55  607    57     -2
50  589    52     -2
45  569    47     -2
40  542    40      0
35  514    34      1
30  482    28      2
25  443    22      3
20  398    17      3
15  345    13      2
10  282    13     -3

As mentioned in an earlier post, the endpoints for high X,Y are showing a high residuals. You may want to treat the top three points as a separate case (with addition data within these points).

Its too bad that the Picaxe math does not support at least a Carry flag indication for under/over flow math operations and mult-word operations. We could do much more if we had at least that much.

Anyways, hope this helps.
 
  • #20
Good Morning, (Theo):

Well, I can see you have done your homework today! You have mentioned a couple of times about "noise" in the data so I am going to re-do all of it again today.

I have set up a DS18B20 digital thermometer chip accurate to 1/2 degree F and a backup hand held unit. I use oil and have a hot plate setup so I can raise the temperature slowly.

I have a Picaxe setup and it is taking 4 samples a second and giving me an average ADC reading. I also have measure the actual divider resistor so I have an exact value for it, but will use just the generic one like 100 ohm for discussion.

I would like to put this in an Excel file and email it to you if possible because the three sets will be long. Let me know.

I am also going to redo the Oil Pressure again and concentrate on the 65-90 range with double the samples.

I hope to have this my mid/late afternoon.

Thanks again for the assistance,
Rex
 
  • #21
RexLan - very good, thanks. I'll look forward to the additional data. I'd like to PM you with an additional email address that will get to me sooner today than the one in the public profile - hope you don't mine.

Please don't knock youself out over the temperature data -- its been my experience that this kind of data can be quite fickle even in the best of measurement envirnoments.

Best of luck!
 

Related to Help w/formula for data points

1. What is a formula for data points?

A formula for data points is a mathematical equation that is used to represent a relationship between two or more data points or variables. It is typically used to make predictions or analyze trends in a given data set.

2. How do I create a formula for data points?

To create a formula for data points, you will need to first determine the relationship between the variables in your data set. This can be done by plotting the data points on a graph and looking for patterns or using statistical methods. Once you have identified the relationship, you can use algebraic symbols and operations to create the formula.

3. Can a formula for data points be used to make predictions?

Yes, a formula for data points can be used to make predictions. By plugging in values for the variables in the formula, you can calculate the corresponding output or predicted value. However, the accuracy of the predictions will depend on the quality of the data and the assumptions made in creating the formula.

4. Are there different types of formulas for data points?

Yes, there are different types of formulas for data points depending on the type of relationship between the variables. Some common types include linear, exponential, and logarithmic formulas. The type of formula used will depend on the nature of the data and the intended purpose of the analysis.

5. How can I validate the accuracy of a formula for data points?

To validate the accuracy of a formula for data points, you can use a technique called cross-validation. This involves splitting the data set into two parts, using one part to create the formula and the other to test its accuracy. If the predicted values closely match the actual values in the test data set, then the formula can be considered accurate.

Similar threads

  • Programming and Computer Science
Replies
9
Views
2K
  • Science and Math Textbooks
Replies
19
Views
17K
Back
Top