Numerical floating point addition

In summary: Then when you get to the larger ones, the rounding error will be smaller since the numbers are smaller.In summary, if you have a sorted array of numbers with a large range of magnitudes, it is better to add them up from smallest to largest in order to minimize truncation error. This can also be further reduced by using a function and an array of 2048 doubles to hold intermediate sums.
  • #1
Khashishi
Science Advisor
2,813
492
If I have a bunch of sorted numbers spanning a large range of magnitudes, is it better to add them up from smallest to largest or from largest to smallest, or something else?

Let's say I'm summing an array A, which is sorted from large to small. Which gives a more accurate result:

sum1 = 0
for i=0,length(A)-1
sum1 += A
end

sum2 = 0
for i=length(A)-1,0
sum2 += A
end
 
Mathematics news on Phys.org
  • #2
Adding the numbers in order, smallest to largest (in magnitude) is better (less truncation error).

It's also possible to further reduce truncation error by using a function and an array of 2048 doubles to hold intermediate sums where the index into the array is based on the exponent of a double precision number ( in C the index = (* (unsigned __int64 *)(&number)) >> 52) & 0x7ff; ). The array is initialized to zero, and each time a new number is to be added, the index for that number is generated. If array[index] == 0. , then the number is just stored, else number = number + array[index]; array[index] = 0.; a new index for number is generated and the process repeated until array[index] == 0 and the number is stored (or until overflow is detected). Once all numbers have been added, then the array is summed from index = 0 to index = 0x7ff to produce the sum. The purpose of this is minimize truncation.
 
Last edited:
  • #3
if they're floating point, that's a no-brainer.

start with zero and add the small ones (in magnitude, negative or positive) first.
 

Related to Numerical floating point addition

1. What is numerical floating point addition?

Numerical floating point addition is a method of performing addition on numbers that have decimal points and can have a large range of values. It is commonly used in computer programming and scientific calculations.

2. How does numerical floating point addition differ from regular addition?

Numerical floating point addition takes into account the binary representation of numbers and uses scientific notation to store and calculate decimal numbers. This allows for a greater range of values and precision compared to regular addition, which is limited by the number of digits in the number.

3. What are some common errors associated with numerical floating point addition?

One common error is rounding errors, where the result of the addition is slightly off due to limitations in the precision of the calculation. Another error is overflow, where the result is too large to be represented by the computer. Underflow, where the result is too small to be accurately represented, is also a common error.

4. How can numerical floating point addition be used in scientific research?

Numerical floating point addition is essential in scientific research for performing complex calculations involving decimal numbers. It is used in fields such as physics, engineering, and finance to accurately model real-world phenomena and analyze data.

5. Are there any alternatives to numerical floating point addition?

There are alternative methods for performing addition, such as using fixed-point arithmetic or using specialized hardware, but numerical floating point addition is the most commonly used and versatile method for handling decimal numbers in scientific and technical calculations.

Similar threads

Replies
3
Views
550
  • Computing and Technology
Replies
4
Views
865
  • Engineering and Comp Sci Homework Help
Replies
9
Views
1K
Replies
4
Views
1K
  • General Math
Replies
2
Views
1K
Replies
6
Views
1K
Replies
5
Views
4K
  • Engineering and Comp Sci Homework Help
Replies
4
Views
2K
Replies
12
Views
2K
Replies
6
Views
2K
Back
Top