Writing IEEE 754-1985 double as ASCII on a limited 16 bytes string

跟風遠走 提交于 2019-12-01 06:27:43
chux

It is trickier than first thought.

Given the various corner cases, it seems best to try at a high precision and then work down as needed.

  1. Any negative number prints the same as a positive number with 1 less precision due to the '-'.

  2. '+' sign not needed at the beginning of the string nor after the 'e'.

  3. '.' not needed.

  4. Dangerous to use anything other than sprintf() to do the mathematical part given so many corner cases. Given various rounding modes, FLT_EVAL_METHOD, etc., leave the heavy coding to well established functions.

  5. When an attempt is too long by more than 1 character, iterations can be saved. E.g. If an attempt, with precision 14, resulted with a width of 20, no need to try precision 13 and 12, just go to 11.

  6. Scaling of the exponent due to the removal of the '.', must be done after sprintf() to 1) avoid injecting computational error 2) decrementing a double to below its minimum exponent.

  7. Maximum relative error is less than 1 part in 2,000,000,000 as with -1.00000000049999e-200. Average relative error about 1 part in 50,000,000,000.

  8. 14 digit precision, the highest, occurs with numbers like 12345678901234e1 so start with 16-2 digits.


static size_t shrink(char *fp_buffer) {
  int lead, expo;
  long long mant;
  int n0, n1;
  int n = sscanf(fp_buffer, "%d.%n%lld%ne%d", &lead, &n0, &mant, &n1, &expo);
  assert(n == 3);
  return sprintf(fp_buffer, "%d%0*llde%d", lead, n1 - n0, mant,
          expo - (n1 - n0));
}

int x16printf(char *dest, size_t width, double value) {
  if (!isfinite(value)) return 1;

  if (width < 5) return 2;
  if (signbit(value)) {
    value = -value;
    strcpy(dest++, "-");
    width--;
  }
  int precision = width - 2;
  while (precision > 0) {
    char buffer[width + 10];
    // %.*e prints 1 digit, '.' and then `precision - 1` digits
    snprintf(buffer, sizeof buffer, "%.*e", precision - 1, value);
    size_t n = shrink(buffer);
    if (n <= width) {
      strcpy(dest, buffer);
      return 0;
    }
    if (n > width + 1) precision -= n - width - 1;
    else precision--;
  }
  return 3;
}

Test code

double rand_double(void) {
  union {
    double d;
    unsigned char uc[sizeof(double)];
  } u;
  do {
    for (size_t i = 0; i < sizeof(double); i++) {
      u.uc[i] = rand();
    }
  } while (!isfinite(u.d));
  return u.d;
}

void x16printf_test(double value) {
  printf("%-27.*e", 17, value);
  char buf[16+1];
  buf[0] = 0;
  int y = x16printf(buf, sizeof buf - 1, value);
  printf(" %d\n", y);
  printf("'%s'\n", buf);
}


int main(void) {
  for (int i = 0; i < 10; i++)
    x16printf_test(rand_double());
}

Output

-1.55736829786841915e+118   0
'-15573682979e108'
-3.06117209691283956e+125   0
'-30611720969e115'
8.05005611774356367e+175    0
'805005611774e164'
-1.06083057094522472e+132   0
'-10608305709e122'
3.39265065244054607e-209    0
'33926506524e-219'
-2.36818580315246204e-244   0
'-2368185803e-253'
7.91188576978592497e+301    0
'791188576979e290'
-1.40513111051994779e-53    0
'-14051311105e-63'
-1.37897140950449389e-14    0
'-13789714095e-24'
-2.15869805640288206e+125   0
'-21586980564e115'
chux

For finite floating point values the printf() format specifier "%e" well matches
"A floating point number shall be ... with an "E" or "e" to indicate the start of the exponent"

[−]d.ddd...ddde±dd

The sign is present with negative numbers and likely -0.0. The exponent is at least 2 digits.

If we assume DBL_MAX < 1e1000, (safe for IEEE 754-1985 double), then the below works in all cases: 1 optional sign, 1 lead digit, '.', 8 digits, 'e', sign, up to 3 digits.

(Note: the "16 bytes maximum" does not seem to refer to C string null character termination. Adjust by 1 if needed.)

// Room for 16 printable characters.
char buf[16+1];
int n = snprintf(buf, sizeof buf, "%.*e", 8, x);
assert(n >= 0 && n < sizeof buf);
puts(buf);

But this reserves room for the optional sign and 2 to 3 exponent digits.

The trick is the boundary, due to rounding, of when a number uses 2 or uses 3 exponent digits is fuzzy. Even testing for negative numbers, the -0.0 is an issue.

[Edit] Also needed test for very small numbers.

Candidate:

// Room for 16 printable characters.
char buf[16+1];
assert(isfinite(x)); // for now, only address finite numbers

int precision = 8+1+1;
if (signbit(x)) precision--;  // Or simply `if (x <= 0.0) precision--;`
if (fabs(x) >= 9.99999999e99) precision--; // some refinement possible here.
else if (fabs(x) <= 1.0e-99) precision--;
int n = snprintf(buf, sizeof buf, "%.*e", precision, x);
assert(n >= 0 && n < sizeof buf);
puts(buf);

Additional concerns:

Some compilers print at least 3 exponent digits.
The maximum number of decimal significant digits for IEEE 754-1985 double needed varies on definition of need, but likely about 15-17. Printf width specifier to maintain precision of floating-point value

Candidate 2: One time test for too long an output

// Room for N printable characters.
#define N 16
char buf[N+1];
assert(isfinite(x)); // for now, only address finite numbers

int precision = N - 2 - 4;  // 1.xxxxxxxxxxe-dd
if (signbit(x)) precision--;
int n = snprintf(buf, sizeof buf, "%.*e", precision, x);
if (n >= sizeof buf) {
  n = snprintf(buf, sizeof buf, "%.*e", precision - (n - sizeof buf) - 1, x);
}
assert(n >= 0 && n < sizeof buf);
puts(buf);

C library formatter has no direct format for your requirement. At a simple level, if you can accept the waste of characters of the standard %g format (e20 is written e+020: 2 chars wasted), you can:

  • generate the output for the %.17g format
  • if it is greater the 16 characters, compute the precision that would lead to 16
  • generate the output for that format.

Code could look like:

void encode(double f, char *buf) {
    char line[40];
    char format[8];
    int prec;
    int l;

    l = sprintf(line, "%.17g", f);
    if (l > 16) {
        prec = 33 - strlen(line);
        l = sprintf(line, "%.*g", prec, f);
        while(l > 16) {
            /* putc('.', stdout);*/
            prec -=1;
            l = sprintf(line, "%.*g", prec, f);
        }
    }
    strcpy(buf, line);
}

If you really try to be optimal (meaning write e30 instead of e+030), you could try to use %1.16e format and post-process the output. Rationale (for positive numbers):

  • the %1.16e format allows you to separate the mantissa and the exponent (base 10)
  • if the exponenent is between size-2 (included) and size (excluded): just correctly round the mantissa to the int part and display it
  • if the exponent is between 0 and size-2 (both included): display the rounded mantissa with the dot correctly placed
  • if the exponent is between -1 and -3 (both included): start with a dot, add eventual 0 and fill with rounded mantissa
  • else use a e format with minimal size for the exponent part and fill with the rounded mantissa

Corner cases:

  • for negative numbers, put a starting - and add the display for the opposite number and size-1
  • rounding : if first rejected digit is >=5, increase preceding number and iterate if it was a 9. Process 9.9999999999... as a special case rounded to 10

Possible code:

void clean(char *mant) {
    char *ix = mant + strlen(mant) - 1;
    while(('0' == *ix) && (ix > mant)) {
        *ix-- = '\0';
    }
    if ('.' == *ix) {
        *ix = '\0';
    }
}

int add1(char *buf, int n) {
    if (n < 0) return 1;
    if (buf[n] == '9') {
        buf[n] = '0';
        return add1(buf, n-1);
    }
    else {
        buf[n] += 1;
    }
    return 0;
}

int doround(char *buf, unsigned int n) {
    char c;
    if (n >= strlen(buf)) return 0;
    c = buf[n];
    buf[n] = 0;
    if ((c >= '5') && (c <= '9')) return add1(buf, n-1);
    return 0;
}

int roundat(char *buf, unsigned int i, int iexp) {
    if (doround(buf, i) != 0) {
        iexp += 1;
        switch(iexp) {
            case -2:
                strcpy(buf, ".01");
                break;
            case -1:
                strcpy(buf, ".1");
                break;
            case 0:
                strcpy(buf, "1.");
                break;
            case 1:
                strcpy(buf, "10");
                break;
            case 2:
                strcpy(buf, "100");
                break;
            default:
                sprintf(buf, "1e%d", iexp);
        }
        return 1;
    }
    return 0;
}

void encode(double f, char *buf, int size) {
    char line[40];
    char *mant = line + 1;
    int iexp, lexp, i;
    char exp[6];

    if (f < 0) {
        f = -f;
        size -= 1;
        *buf++ = '-';
    }
    sprintf(line, "%1.16e", f);
    if (line[0] == '-') {
        f = -f;
    size -= 1;
    *buf++ = '-';
    sprintf(line, "%1.16e", f);
    }
    *mant = line[0];
    i = strcspn(mant, "eE");
    mant[i] = '\0';
    iexp = strtol(mant + i + 1, NULL, 10);
    lexp = sprintf(exp, "e%d", iexp);
    if ((iexp >= size) || (iexp < -3)) {
        i = roundat(mant, size - 1 -lexp, iexp);
        if(i == 1) {
            strcpy(buf, mant);
            return;
        }
        buf[0] = mant[0];
        buf[1] = '.';
        strncpy(buf + i + 2, mant + 1, size - 2 - lexp);
        buf[size-lexp] = 0;
        clean(buf);
        strcat(buf, exp);
    }
    else if (iexp >= size - 2) {
        roundat(mant, iexp + 1, iexp);
        strcpy(buf, mant);
    }
    else if (iexp >= 0) {
        i = roundat(mant, size - 1, iexp);
        if (i == 1) {
            strcpy(buf, mant);
            return;
        }
        strncpy(buf, mant, iexp + 1);
        buf[iexp + 1] = '.';
        strncpy(buf + iexp + 2, mant + iexp + 1, size - iexp - 1);
        buf[size] = 0;
        clean(buf);
    }
    else {
        int j;
        i = roundat(mant, size + 1 + iexp, iexp);
        if (i == 1) {
            strcpy(buf, mant);
            return;
        }
        buf[0] = '.';
        for(j=0; j< -1 - iexp; j++) {
            buf[j+1] = '0';
        }
        if ((i == 1) && (iexp != -1)) {
            buf[-iexp] = '1';
            buf++;
        }
        strncpy(buf - iexp, mant, size + 1 + iexp);
        buf[size] = 0;
        clean(buf);
    }
}

I think your best option is to use printf("%.17g\n", d); to generate an initial answer and then trim it. The simplest way to trim it is to drop digits from the end of the mantissa until it fits. This actually works very well but will not minimize the error because you are truncating instead of rounding to nearest.

A better solution would be to examine the digits to be removed, treating them as an n-digit number between 0.0 and 1.0, so '49' would be 0.49. If their value is less than 0.5 then just remove them. If their value is greater than 0.50 then increment the printed value in its decimal form. That is, add one to the last digit, with wrap-around and carry as needed. Any trailing zeroes that are created should be trimmed.

The only time this becomes a problem is if the carry propagates all the way to the first digit and overflows it from 9 to zero. This might be impossible, but I don't know for sure. In this case (+9.99999e17) the answer would be +1e18, so as long as you have tests for that case you should be fine.

So, print the number, split it into sign/mantissa strings and an exponent integer, and string manipulate them to get your result.

Printing in decimal cannot work because for some numbers a 17 digit mantissa is needed which uses up all of your space without printing the exponent. To be more precise, printing a double in decimal sometimes requires more than 16 characters to guarantee accurate round-tripping.

Instead you should print the underlying binary representation using hexadecimal. This will use exactly 16 bytes, assuming that a null-terminator isn't needed.

If you want to print the results using fewer than 16 bytes then you can basically uuencode it. That is, use more than 16 digits so that you can squeeze more bits into each digit. If you use 64 different characters (six bits) then a 64-bit double can be printed in eleven characters. Not very readable, but tradeoffs must be made.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!