Numerical floating point addition

Khashishi · Feb 28, 2013

If I have a bunch of sorted numbers spanning a large range of magnitudes, is it better to add them up from smallest to largest or from largest to smallest, or something else?

Let's say I'm summing an array A, which is sorted from large to small. Which gives a more accurate result:

sum1 = 0
for i=0,length(A)-1
sum1 += A
end

sum2 = 0
for i=length(A)-1,0
sum2 += A
end

rcgldr · Feb 28, 2013

Adding the numbers in order, smallest to largest (in magnitude) is better (less truncation error).

It's also possible to further reduce truncation error by using a function and an array of 2048 doubles to hold intermediate sums where the index into the array is based on the exponent of a double precision number ( in C the index = (* (unsigned __int64 *)(&number)) >> 52) & 0x7ff; ). The array is initialized to zero, and each time a new number is to be added, the index for that number is generated. If array[index] == 0. , then the number is just stored, else number = number + array[index]; array[index] = 0.; a new index for number is generated and the process repeated until array[index] == 0 and the number is stored (or until overflow is detected). Once all numbers have been added, then the array is summed from index = 0 to index = 0x7ff to produce the sum. The purpose of this is minimize truncation.

rbj · Feb 28, 2013

if they're floating point, that's a no-brainer.

start with zero and add the small ones (in magnitude, negative or positive) first.

Numerical floating point addition

Related to Numerical floating point addition

1. What is numerical floating point addition?

2. How does numerical floating point addition differ from regular addition?

3. What are some common errors associated with numerical floating point addition?

4. How can numerical floating point addition be used in scientific research?

5. Are there any alternatives to numerical floating point addition?

Similar threads

Hot Threads

Recent Insights