ieee-754

Converting IEEE 754 from bit stream into float in JavaScript

筅森魡賤 提交于 2021-02-06 13:50:27
问题 I have serialized 32-bit floating number using GO language function (math.Float32bits) which returns the floating point number corresponding to the IEEE 754 binary representation. This number is then serialized as 32-bit integer and is read into java script as byte array. For example, here is actual number: float: 2.8088086 as byte array: 40 33 c3 85 as hex: 0x4033c385 There is a demo converter that displays the same numbers. I need to get that same floating number back from byte array in

What would cause the C/C++ <, <=, and == operators to return true if either argument is NaN?

血红的双手。 提交于 2021-02-06 06:38:26
问题 My understanding of the rules of IEEE-754 floating-point comparisons is that all comparison operators except != will return false if either or both arguments are NaN, while the != operator will return true. I can easily reproduce this behavior with a simple standalone test: for (int ii = 0; ii < 4; ++ii) { float a = (ii & 1) != 0 ? NAN : 1.0f; float b = (ii & 2) != 0 ? NAN : 2.0f; #define TEST(OP) printf("%4.1f %2s %4.1f => %s\n", a, #OP, b, a OP b ? "true" : "false"); TEST(<) TEST(>) TEST(<=

What would cause the C/C++ <, <=, and == operators to return true if either argument is NaN?

拥有回忆 提交于 2021-02-06 06:36:47
问题 My understanding of the rules of IEEE-754 floating-point comparisons is that all comparison operators except != will return false if either or both arguments are NaN, while the != operator will return true. I can easily reproduce this behavior with a simple standalone test: for (int ii = 0; ii < 4; ++ii) { float a = (ii & 1) != 0 ? NAN : 1.0f; float b = (ii & 2) != 0 ? NAN : 2.0f; #define TEST(OP) printf("%4.1f %2s %4.1f => %s\n", a, #OP, b, a OP b ? "true" : "false"); TEST(<) TEST(>) TEST(<=

What would cause the C/C++ <, <=, and == operators to return true if either argument is NaN?

喜欢而已 提交于 2021-02-06 06:36:24
问题 My understanding of the rules of IEEE-754 floating-point comparisons is that all comparison operators except != will return false if either or both arguments are NaN, while the != operator will return true. I can easily reproduce this behavior with a simple standalone test: for (int ii = 0; ii < 4; ++ii) { float a = (ii & 1) != 0 ? NAN : 1.0f; float b = (ii & 2) != 0 ? NAN : 2.0f; #define TEST(OP) printf("%4.1f %2s %4.1f => %s\n", a, #OP, b, a OP b ? "true" : "false"); TEST(<) TEST(>) TEST(<=

What would cause the C/C++ <, <=, and == operators to return true if either argument is NaN?

烂漫一生 提交于 2021-02-06 06:34:24
问题 My understanding of the rules of IEEE-754 floating-point comparisons is that all comparison operators except != will return false if either or both arguments are NaN, while the != operator will return true. I can easily reproduce this behavior with a simple standalone test: for (int ii = 0; ii < 4; ++ii) { float a = (ii & 1) != 0 ? NAN : 1.0f; float b = (ii & 2) != 0 ? NAN : 2.0f; #define TEST(OP) printf("%4.1f %2s %4.1f => %s\n", a, #OP, b, a OP b ? "true" : "false"); TEST(<) TEST(>) TEST(<=

What would cause the C/C++ <, <=, and == operators to return true if either argument is NaN?

落爺英雄遲暮 提交于 2021-02-06 06:34:18
问题 My understanding of the rules of IEEE-754 floating-point comparisons is that all comparison operators except != will return false if either or both arguments are NaN, while the != operator will return true. I can easily reproduce this behavior with a simple standalone test: for (int ii = 0; ii < 4; ++ii) { float a = (ii & 1) != 0 ? NAN : 1.0f; float b = (ii & 2) != 0 ? NAN : 2.0f; #define TEST(OP) printf("%4.1f %2s %4.1f => %s\n", a, #OP, b, a OP b ? "true" : "false"); TEST(<) TEST(>) TEST(<=

correctly-rounded double-precision division

拟墨画扇 提交于 2021-02-05 06:46:26
问题 I am using the following algorithm for double-precision division and trying to make it correctly rounded in software emulation of floating-point. Let a be the dividend and b is the divisor. All operations are performed in Q2.62. Initial approximation to the reciprocal is . b/2 is the significand of b with its implicit bit added, and shifted one right. For what follows, when written a or b it is meant by the significand of a or b with its implicit bit added. The is approximated with

Convert IEEE float to TI TMS320C30 32bits float in python

穿精又带淫゛_ 提交于 2021-01-29 07:50:52
问题 I need to convert a python float to a TI DSP TMS320C30 float representation, following this convention: http://www.ti.com/lit/an/spra400/spra400.pdf#page=13 I've tried a few things, but I can't seem to wrap my head around the proposed algorithm. I also found a C version of the algorithm, but it looks like it is a version that runs in the TI DSP, so there are operations that I can't figure out (reverse the sign, for instance). I have a very naive implementation below, but it doesn't work... #

C++ static assert of IEEE754

烂漫一生 提交于 2021-01-28 06:37:40
问题 ) How to make a static assert of IEEE754 norm (floating point representation)? My idea was something like that: static unsigned char c[8] = { 0, 0, 0, 0, 0, 0xd0, 0x84, 0x40 }; static double d= *reinterpret_cast<double *>(c); BOOST_STATIC_ASSERT(d==666.); But it doesnt't work :( I should point out that my compiler is not C++11 (I use visual studio 2008) and I don't have regular static asserts. 回答1: First, note that due to compiler idiosyncrasies you can't reliably assert that floating point

FP: invalid operation: contradiction between C (UB) and IEEE 754 (WDB)?

北城以北 提交于 2021-01-07 01:06:06
问题 N2479 C17..C2x working draft — February 5, 2020 ISO/IEC 9899:202x (E): 6.3.1.4 Real floating and integer: 1 When a finite value of standard floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined . IEEE 754-2019: 5.8 Details of conversions from floating-point to integer formats: When a NaN or infinite