I am transferring data over to a slave device from an iPhone where the transmission requires a 16 bit data value. Now I have a floating-point value that I need to transfer, but
On iOS devices, floating-point numbers are stored with a sign bit, a biased exponent, and an encoded significand.
With 32-bit floats (float
), the exponent is eight bits and is biased by 127, and the encoded significand is 23 bits. With 64-bit floats (double
), the exponent is eleven bits and is biased by 1023, and the encoded significand is 52 bits.
The following describes 32-bit floats. It is from memory; I have not double-checked it. 64-bit floats are similar.
Consider a float F. Define E with unsigned int E = (union { float f; unsigned int u; }) { F }.u;
. On an iOS device (and many other common computers), E will contain the encoding of F, that is, the bits that represent it.
Let s be E>>31
. It is the sign bit. It is 0 if F is positive (including +0) and 1 if F is negative (including -0).
Let e be E>>23 & 0xff
. That is the biased exponent. The unbiased (actual) exponent is e-127.
Let f be E & 0x7fffff
. That is the encoded significand. If 0 < e < 255, the actual significand is 1.f2, where f is the 23 bits of f written as a binary numeral. So, if f is 0x600000, then f is, in binary, 11000000000000000000002, so the actual significand is 1.11000000000000000000002, which is 1.112, which is 1+1/2+1/4 = 1.75.
Together, the number represented by this encoding is (-1)s•2e-127•1.f2. So, for the encoding 0x40600000, s is 0, e is 0x100 = 128, and f is 11000000000000000000002, so the value is (-1)0•2128-127•1.75 = 1•2•1.75 = 3.5.
There are some special cases. If e is 0, then the actual significand is 0.f2 instead of 1.f2. That is, the implicit 1 is changed to 0. Note that if f is zero, then the value represented is 0. The sign is still meaningful with a floating-point zero; +0 and -0 have slightly different behaviors.
If e is 255 and f is 0, the value represented is infinite, either +infinity or -infinity, depending on the sign.
If e is 255 and f is not 0, the value is a NaN with some implementation-defined semantics.
To encode a floating-point value, you determine the sign, then calculate the largest power of two not greater than the value’s magnitude. That power of two gives you the unbiased exponent. Then you divide the value by the power of two and round it to fit in the significand, and that gives you the significand. There are some special cases when the rounding pushes the significand to 2 (you have to adjust the exponent) and when the exponent is large enough to underflow or small enough to underflow into the denormal range (below the range where the biased exponent is 1 or greater).