Data type: Float Point or Double?

In summary, the conversation discusses the use of data types in computational physics and the potential use of float as an alternative to double. The experts in the conversation mention that the use of double is preferred for its accuracy and efficiency in certain calculations. They also discuss how different compilers may treat float and double data types differently and provide examples of assembler code to support their claims.
  • #1
dimensionless
462
1
I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?
 
Technology news on Phys.org
  • #2
dimensionless said:
I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?

The reason that they use Double is because in order to get the best accuracy, using double precision floating point values allows them use significant digit calculations with the smallest loss of precision due to rounding as possible.

Float would be fine for a home work problem, but I would not use it when calculating vectors on a trip to mars.
 
  • #3
Languages have limits on their implementations of data types.
In C the "#include <limits.h>" brings in defined values that tell you how much precision a datatype has.

Number of accurate digits in floating point from HP UX (UNIX) C:

FLT_DIG 6 digits of precision
DBL_DIG 15 digits of precision

Which would you rather have when it takes no more CPU (with a floating point processor) to do "double" math or "float" math operations?
 
  • #4
Jim,

Different compilers, in fact, regard the types "float" and "double" differently. On most compilers, floats are 4 bytes and doubles are 8 bytes.

Also, double-precision arithmetic is certainly slower than single-precision arithmetic. Modern processors have vector-math units (MMX, SSE, etc.), the use of which is scheduled by your compiler. You can do twice as many single-precision operations per unit time as double-precision operations with these vector-math units.

- Warren
 
  • #5
chroot -

I disagree with your statement that double precision is always slower than single. If your model were correct everywhere, the following assembler sequences would never occur.

HPUX V-class PA-RISC boxes preferentially use double precision FP
operations, because they are more efficient. I've been on other platforms
where this is also true. Here is a concrete HPUX 11.00 example.

Consider some C code compiled with cc -S myfile.c -DTYPE=<float or double, see below> to create ASM:

Code:
#include <math.h>
/* TYPE defined by -DTYPE=float or -DTYPE=double */
TYPE process(TYPE a, TYPE b)
{
    TYPE tmp=a;
    tmp -=.05;
    tmp*=.5;
    tmp+=b;
    if ( fabs(a+b)>0.) tmp/=(a+b);
    return tmp;
}
On HPUX V-class, PA-RISC boxes, when compiled with -DTYPE=double
This is the assember produced when the compiler is dealing with double datatypes,
I put some *'s in front of one area of interest. Note there are no FCNV calls,
and all FP operations (FMPY FSUB, etc) are on double precision FP numbers.
Code:
process
        .PROC
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        STW     %r2,-20(%r30)   ;offset 0x0
        LDO     64(%r30),%r30   ;offset 0x4
*       FSTD    %fr5,-104(%r30) ;offset 0x8
*       FSTD    %fr7,-112(%r30) ;offset 0xc
*       FLDD    -104(%r30),%fr4 ;offset 0x10
*       FSTD    %fr4,-56(%r30)  ;offset 0x14
*       FLDD    -56(%r30),%fr5  ;offset 0x18
*       LDIL    LR'S$6$process,%r1      ;offset 0x1c
*       FLDD    RR'S$6$process(%r1),%fr6        ;offset 0x20
*       FSUB,DBL        %fr5,%fr6,%fr7  ;offset 0x24
        FSTD    %fr7,-56(%r30)  ;offset 0x28
        FLDD    -56(%r30),%fr8  ;offset 0x2c
        LDIL    LR'S$6$process,%r31     ;offset 0x30
        FLDD    RR'S$6$process+8(%r31),%fr9     ;offset 0x34
        FMPY,DBL        %fr8,%fr9,%fr10 ;offset 0x38
        FSTD    %fr10,-56(%r30) ;offset 0x3c
        FLDD    -56(%r30),%fr11 ;offset 0x40
        FLDD    -112(%r30),%fr22        ;offset 0x44
        FADD,DBL        %fr11,%fr22,%fr23       ;offset 0x48
        FSTD    %fr23,-56(%r30) ;offset 0x4c
        FLDD    -104(%r30),%fr24        ;offset 0x50
        FLDD    -112(%r30),%fr25        ;offset 0x54
        FADD,DBL        %fr24,%fr25,%fr5        ;offset 0x58
        LDIL    L'fabs,%r31     ;offset 0x5c
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x60
        COPY    %r31,%r2        ;offset 0x64
        FCPY,DBL        %fr0,%fr26      ;offset 0x68
        FCMP,DBL,>      %fr4,%fr26      ;offset 0x6c
        FTEST           ;offset 0x70
        B,N     $00000001       ;offset 0x74
        FLDD    -104(%r30),%fr27        ;offset 0x78
        FLDD    -112(%r30),%fr28        ;offset 0x7c
        FADD,DBL        %fr27,%fr28,%fr29       ;offset 0x80
        FLDD    -56(%r30),%fr30 ;offset 0x84
        FDIV,DBL        %fr30,%fr29,%fr31       ;offset 0x88
        ..... code omitted to save space.

When the same code is compiled with -DTYPE=float, note that FCNV is called to convert float to double, and FMPY and FSUB use double precision. FADD does not.
Code:
process                                                             
        .PROC                                                       
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE 
        .ENTRY                                                      
        STW     %r2,-20(%r30)   ;offset 0x0                         
        LDO     64(%r30),%r30   ;offset 0x4                         
*       FSTW    %fr4L,-100(%r30)        ;offset 0x8                 
*       FSTW    %fr5L,-104(%r30)        ;offset 0xc                 
*       FLDW    -100(%r30),%fr4L        ;offset 0x10                
*       FSTW    %fr4L,-56(%r30) ;offset 0x14                        
*       FLDW    -56(%r30),%fr4R ;offset 0x18                        
*       FCNV,SGL,DBL    %fr4R,%fr4      ;offset 0x1c                
*       LDIL    LR'S$6$process,%r1      ;offset 0x20                
*       FLDD    RR'S$6$process(%r1),%fr5        ;offset 0x24        
*       FSUB,DBL        %fr4,%fr5,%fr6  ;offset 0x28                
*       FCNV,DBL,SGL    %fr6,%fr5L      ;offset 0x2c                
        FSTW    %fr5L,-56(%r30) ;offset 0x30                        
        FLDW    -56(%r30),%fr5R ;offset 0x34                        
        FCNV,SGL,DBL    %fr5R,%fr7      ;offset 0x38                
        LDIL    LR'S$6$process,%r31     ;offset 0x3c                
        FLDD    RR'S$6$process+8(%r31),%fr8     ;offset 0x40        
        FMPY,DBL        %fr7,%fr8,%fr9  ;offset 0x44                
        FCNV,DBL,SGL    %fr9,%fr6L      ;offset 0x48                
        FSTW    %fr6L,-56(%r30) ;offset 0x4c                        
        FLDW    -56(%r30),%fr6R ;offset 0x50                        
        FLDW    -104(%r30),%fr7L        ;offset 0x54                
        FADD,SGL        %fr6R,%fr7L,%fr7R       ;offset 0x58        
        FSTW    %fr7R,-56(%r30) ;offset 0x5c                        
        FLDW    -100(%r30),%fr8L        ;offset 0x60                
        FLDW    -104(%r30),%fr8R        ;offset 0x64                
        FADD,SGL        %fr8L,%fr8R,%fr9L       ;offset 0x68        
        FCNV,SGL,DBL    %fr9L,%fr5      ;offset 0x6c                
        LDIL    L'fabs,%r31     ;offset 0x70                        
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x74                
        COPY    %r31,%r2        ;offset 0x78                        
        FCPY,DBL        %fr0,%fr10      ;offset 0x7c                
        FCMP,DBL,>      %fr4,%fr10      ;offset 0x80                
        FTEST           ;offset 0x84                                
        B,N     $00000001       ;offset 0x88   
        .... code omitted
Please notice that floats are converted to doubles (FCNV mnemonic is float convert)
before most arithmetic operations on floats.
 
  • #6
Jim,

I never said double-precision is always slower than single-precision. I just took offense to your statement that single-precision is never faster than double-precision. I don't have much experience with the PA-RISC instruction set, but, on many processors, under many conditions, single-precision is indeed faster than double-precision.

- Warren
 

Related to Data type: Float Point or Double?

1. What is the difference between float point and double data types?

Float point and double data types are both used to store decimal numbers in a computer's memory. The main difference between them is the amount of precision they offer. Float point data type can store up to 7 decimal digits, while double data type can store up to 15 decimal digits. This means that double data type is more accurate and can store larger numbers compared to float point data type.

2. When should I use float point and when should I use double data type?

The choice between float point and double data types depends on the specific needs of your project. If you need high precision and accuracy, then double data type would be a better choice. However, if you are working with a large dataset and do not require high precision, then float point data type would suffice and also save memory space.

3. Can I convert a float point data type to double data type and vice versa?

Yes, it is possible to convert a float point data type to double data type and vice versa. However, it is important to note that during the conversion, there may be loss of precision or rounding errors. It is recommended to only convert data types when necessary and to handle any potential errors that may occur.

4. Is there a performance difference between float point and double data types?

Yes, there is a performance difference between float point and double data types. Double data type requires more memory space and larger calculations, which can impact the performance of your program. If high performance is a priority, then using float point data type would be a better choice.

5. Are float point and double data types supported in all programming languages?

Yes, float point and double data types are supported in most programming languages. However, the specific syntax and range of values may vary depending on the programming language. It is important to refer to the documentation of the language you are using to ensure proper usage of these data types.

Similar threads

  • Programming and Computer Science
2
Replies
42
Views
6K
  • Programming and Computer Science
Replies
1
Views
424
  • Programming and Computer Science
Replies
30
Views
4K
  • Programming and Computer Science
Replies
11
Views
1K
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
32
Views
1K
  • Programming and Computer Science
Replies
8
Views
415
  • Programming and Computer Science
Replies
13
Views
1K
  • Programming and Computer Science
Replies
19
Views
2K
  • Programming and Computer Science
2
Replies
50
Views
4K
Back
Top