How do you print the EXACT value of a floating point number?

后端 未结 10 1313
自闭症患者
自闭症患者 2020-11-27 03:45

First of all, this is not a floating point newbie question. I know results of floating point arithmetic (not to mention transcendental functions) usually cannot be represent

10条回答
  •  误落风尘
    2020-11-27 04:19

    There are 3 ways

    1. printing numbers in bin or hex

      This is the most precise way. I prefer hex because it is more like base 10 for reading/feel like for example F.8h = 15.5 no precision loss here.

    2. printing in dec but only the relevant digits

      With this I mean only digits which can have 1 in your number represented as close as possible.

      num of integer digits are easy and precise (no precision loss):

      // n10 - base 10 integer digits
      // n2  - base 2 integer digits
      n10=log10(2^n2)
      n10=log2(2^n2)/log2(10)
      n10=n2/log2(10)
      n10=ceil(n2*0.30102999566398119521373889472449)
      // if fist digit is 0 and n10 > 1 then n10--
      

      num of fractional digits are more tricky and empirically I found this:

      // n10 - base 10 fract. digits
      // n2  - base 2 fract. digits >= 8
      n10=0; if (n02==8) n10=1;
      else if (n02==9) n10=2;
      else if (n02> 9)
          {
          n10=((n02-9)%10);
               if (n10>=6) n10=2;
          else if (n10>=1) n10=1;
          n10+=2+(((n02-9)/10)*3);
          }
      

      if you make a dependency table n02 <-> n10 then you see that constant 0.30102999566398119521373889472449 is still present, but at start from 8 bits because less cannot represent 0.1 with satisfactory precision (0.85 - 1.15). because of negative exponents of base 2 the dependency is not linear, instead it patterns. This code works for small bit count (<=52) but at larger bit counts there can be error because used pattern do not fit log10(2) or 1/log2(10) exactly.

      for larger bit counts I use this:

      n10=7.810+(9.6366363636363636363636*((n02>>5)-1.0));
      

      but that formula is 32bit aligned !!! and also bigger bit count ads error to it

      P.S. further analysis of binary representation of decadic numbers

      0.1
      0.01
      0.001
      0.0001
      ...
      

      may reveal the exact pattern repetition which would lead to exact number of relevant digits for any bit count.

      for clarity:

      8 bin digits -> 1 dec digits
      9 bin digits -> 2 dec digits
      10 bin digits -> 3 dec digits
      11 bin digits -> 3 dec digits
      12 bin digits -> 3 dec digits
      13 bin digits -> 3 dec digits
      14 bin digits -> 3 dec digits
      15 bin digits -> 4 dec digits
      16 bin digits -> 4 dec digits
      17 bin digits -> 4 dec digits
      18 bin digits -> 4 dec digits
      19 bin digits -> 5 dec digits
      20 bin digits -> 6 dec digits
      21 bin digits -> 6 dec digits
      22 bin digits -> 6 dec digits
      23 bin digits -> 6 dec digits
      24 bin digits -> 6 dec digits
      25 bin digits -> 7 dec digits
      26 bin digits -> 7 dec digits
      27 bin digits -> 7 dec digits
      28 bin digits -> 7 dec digits
      29 bin digits -> 8 dec digits
      30 bin digits -> 9 dec digits
      31 bin digits -> 9 dec digits
      32 bin digits -> 9 dec digits
      33 bin digits -> 9 dec digits
      34 bin digits -> 9 dec digits
      35 bin digits -> 10 dec digits
      36 bin digits -> 10 dec digits
      37 bin digits -> 10 dec digits
      38 bin digits -> 10 dec digits
      39 bin digits -> 11 dec digits
      40 bin digits -> 12 dec digits
      41 bin digits -> 12 dec digits
      42 bin digits -> 12 dec digits
      43 bin digits -> 12 dec digits
      44 bin digits -> 12 dec digits
      45 bin digits -> 13 dec digits
      46 bin digits -> 13 dec digits
      47 bin digits -> 13 dec digits
      48 bin digits -> 13 dec digits
      49 bin digits -> 14 dec digits
      50 bin digits -> 15 dec digits
      51 bin digits -> 15 dec digits
      52 bin digits -> 15 dec digits
      53 bin digits -> 15 dec digits
      54 bin digits -> 15 dec digits
      55 bin digits -> 16 dec digits
      56 bin digits -> 16 dec digits
      57 bin digits -> 16 dec digits
      58 bin digits -> 16 dec digits
      59 bin digits -> 17 dec digits
      60 bin digits -> 18 dec digits
      61 bin digits -> 18 dec digits
      62 bin digits -> 18 dec digits
      63 bin digits -> 18 dec digits
      64 bin digits -> 18 dec digits
      

      And at last do not forget to round the cut off digits !!! That means if digit after the last relevant digit is >=5 in dec than last relevant digit should be +1 ... and if it is already 9 than you must go to previous digit and so on...

    3. print exact value

      To print exact value of fractional binary number just print fractional n digits where n is the number of fractional bits because the value represented is sum of this values so the number of fractional decimals cannot be bigger than the num of fractional digits of LSB. Stuff above (bullet #2) is relevant for storing decimal number to float (or printing just relevant decimals).

      negative powers of two exact values...

      2^- 1 = 0.5
      2^- 2 = 0.25
      2^- 3 = 0.125
      2^- 4 = 0.0625
      2^- 5 = 0.03125
      2^- 6 = 0.015625
      2^- 7 = 0.0078125
      2^- 8 = 0.00390625
      2^- 9 = 0.001953125
      2^-10 = 0.0009765625
      2^-11 = 0.00048828125
      2^-12 = 0.000244140625
      2^-13 = 0.0001220703125
      2^-14 = 0.00006103515625
      2^-15 = 0.000030517578125
      2^-16 = 0.0000152587890625
      2^-17 = 0.00000762939453125
      2^-18 = 0.000003814697265625
      2^-19 = 0.0000019073486328125
      2^-20 = 0.00000095367431640625
      

      now negative powers of 10 printed by exact value style for 64 bit doubles:

      10^+ -1 = 0.1000000000000000055511151231257827021181583404541015625
              = 0.0001100110011001100110011001100110011001100110011001101b
      10^+ -2 = 0.01000000000000000020816681711721685132943093776702880859375
              = 0.00000010100011110101110000101000111101011100001010001111011b
      10^+ -3 = 0.001000000000000000020816681711721685132943093776702880859375
              = 0.000000000100000110001001001101110100101111000110101001111111b
      10^+ -4 = 0.000100000000000000004792173602385929598312941379845142364501953125
              = 0.000000000000011010001101101110001011101011000111000100001100101101b
      10^+ -5 = 0.000010000000000000000818030539140313095458623138256371021270751953125
              = 0.000000000000000010100111110001011010110001000111000110110100011110001b
      10^+ -6 = 0.000000999999999999999954748111825886258685613938723690807819366455078125
              = 0.000000000000000000010000110001101111011110100000101101011110110110001101b
      10^+ -7 = 0.0000000999999999999999954748111825886258685613938723690807819366455078125
              = 0.0000000000000000000000011010110101111111001010011010101111001010111101001b
      10^+ -8 = 0.000000010000000000000000209225608301284726753266340892878361046314239501953125
              = 0.000000000000000000000000001010101111001100011101110001000110000100011000011101b
      10^+ -9 = 0.0000000010000000000000000622815914577798564188970686927859787829220294952392578125
              = 0.0000000000000000000000000000010001001011100000101111101000001001101101011010010101b
      10^+-10 = 0.00000000010000000000000000364321973154977415791655470655996396089904010295867919921875
              = 0.00000000000000000000000000000000011011011111001101111111011001110101111011110110111011b
      10^+-11 = 0.00000000000999999999999999939496969281939810930172340963650867706746794283390045166015625
              = 0.00000000000000000000000000000000000010101111111010111111111100001011110010110010010010101b
      10^+-12 = 0.00000000000099999999999999997988664762925561536725284350612952266601496376097202301025390625
              = 0.00000000000000000000000000000000000000010001100101111001100110000001001011011110101000010001b
      10^+-13 = 0.00000000000010000000000000000303737455634003709136034716842278413651001756079494953155517578125
              = 0.00000000000000000000000000000000000000000001110000100101110000100110100001001001011101101000001b
      10^+-14 = 0.000000000000009999999999999999988193093545598986971343290729163921781719182035885751247406005859375
              = 0.000000000000000000000000000000000000000000000010110100001001001101110000110101000010010101110011011b
      10^+-15 = 0.00000000000000100000000000000007770539987666107923830718560119501514549256171449087560176849365234375
              = 0.00000000000000000000000000000000000000000000000001001000000011101011111001111011100111010101100001011b
      10^+-16 = 0.00000000000000009999999999999999790977867240346035618411149408467364363417573258630000054836273193359375
              = 0.00000000000000000000000000000000000000000000000000000111001101001010110010100101111101100010001001101111b
      10^+-17 = 0.0000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
              = 0.0000000000000000000000000000000000000000000000000000000010111000011101111010101000110010001101101010010010111b
      10^+-18 = 0.00000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
              = 0.00000000000000000000000000000000000000000000000000000000000100100111001001011101110100011101001001000011101011b
      10^+-19 = 0.000000000000000000099999999999999997524592683526013185572915905567688179926555402943222361500374972820281982421875
              = 0.000000000000000000000000000000000000000000000000000000000000000111011000001111001001010011111011011011010010101011b
      10^+-20 = 0.00000000000000000000999999999999999945153271454209571651729503702787392447107715776066783064379706047475337982177734375
              = 0.00000000000000000000000000000000000000000000000000000000000000000010111100111001010000100001100100100100100001000100011b
      

      now negative powers of 10 printed by relevant decimal digits only style (i am more used to this) for 64bit doubles:

      10^+ -1 = 0.1
      10^+ -2 = 0.01
      10^+ -3 = 0.001
      10^+ -4 = 0.0001
      10^+ -5 = 0.00001
      10^+ -6 = 0.000001
      10^+ -7 = 0.0000001
      10^+ -8 = 0.00000001
      10^+ -9 = 0.000000001
      10^+-10 = 0.0000000001
      10^+-11 = 0.00000000001
      10^+-12 = 0.000000000001
      10^+-13 = 0.0000000000001
      10^+-14 = 0.00000000000001
      10^+-15 = 0.000000000000001
      10^+-16 = 0.0000000000000001
      10^+-17 = 0.00000000000000001
      10^+-18 = 0.000000000000000001
      10^+-19 = 0.0000000000000000001
      10^+-20 = 0.00000000000000000001
      

      hope it helps :)

提交回复
热议问题