c++ - float128 and double-double arithmetic

Question

Welcome To Ask or Share your Answers For Others

c++ - float128 and double-double arithmetic

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I've seen in wikipedia that someway to implement quad-precision is to use double-double arithmetic even if it's not exactly the same precision in terms of bits: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format

In this case, we use two double to store the value. So we make two operations to compute the result, one for each double of the result.

In this case we can have round-off errors on each double or their is a mechanism that avoid this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

686 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:33:46+0000

“In this case, we use two double to store the value. So we need to make two operations at each time.”

This is not how double-double arithmetic works. You should expect one double-double operation to be implemented in anywhere from 6 to 20 double operations depending on the actual operation being implemented, the availability of a fused-multiply-add operation, the assumption that one operand is larger than the other, …

For instance, here is one implementation of a double-double multiplication for when an FMA instruction is not available, taken from CRlibm:

#define Mul22(zh,zl,xh,xl,yh,yl)                      
{                                                     
double mh, ml;                                        
                              
  const double c = 134217729.;                
  double up, u1, u2, vp, v1, v2;              
                              
  up = (xh)*c;        vp = (yh)*c;            
  u1 = ((xh)-up)+up;  v1 = ((yh)-vp)+vp;          
  u2 = (xh)-u1;       v2 = (yh)-v1;                   
                              
  mh = (xh)*(yh);                     
  ml = (((u1*v1-mh)+(u1*v2))+(u2*v1))+(u2*v2);        
                              
  ml += (xh)*(yl) + (xl)*(yh);                
  *zh = mh+ml;                        
  *zl = mh - (*zh) + ml;                              
}

The first 8 operations alone are for dividing exactly each double from the operands into two halves so that one half from each side can be multiplied with one half from the other side and the result obtained exactly as a double. The computations u1*v1, u1*v2, … do exactly that.

The values obtained in mh and ml can overlap, so the last 3 operations are there to renormalize the result into the sum of two floating-point numbers.

In this case we can have round-off errors on each double or their is a mechanism that avoid this?

As the comment says:

/*
 * computes double-double multiplication: zh+zl = (xh+xl) *  (yh+yl)
 * relative error is smaller than 2^-102
 */

You can find about all the mechanisms used to achieve these results in the Handbook of Floating-Point Arithmetic.

Categories

c++ - float128 and double-double arithmetic

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags