How to add and subtract 16 bit floating point half precision numbers?
问题 How do I add and subtract 16 bit floating point half precision numbers? Say I need to add or subtract: 1 10000 0000000000 1 01111 1111100000 2’s complement form. 回答1: Assuming you are using a denormalized representation similar to that of IEEE single/double precision, just compute the sign = (-1)^S, the mantissa as 1.M if E != 0 and 0.M if E == 0, and the exponent = E - 2^(n-1), operate on these natural representations, and convert back to the 16-bit format. sign1 = -1 mantissa1 = 1.0