floating-accuracy

Floating point addition - giving strange result..!

时间秒杀一切 提交于 2019-12-01 11:49:36
When executing the following code: public class FPoint { public static void main(String[] args) { float f = 0.1f; for(int i = 0; i<9; i++) { f += 0.1f; } System.out.println(f); } } The following output is displayed: 1.0000001 But output should be 1.0000000 , right? Correct me if I'm wrong..!! 0.1 is not really "0.1" with IEEE 754 Standard. 0.1 is coded : 0 01111011 10011001100110011001101 (with float number) 0 is the sign (= positive) 01111011 the exponent (= 123 -> 123 - 127 = -4 (127 is the bias in IEEE 754)) 10011001100110011001101 the mantissa To convert the mantissa in decimal number we

float strange imprecision error in c [duplicate]

拜拜、爱过 提交于 2019-12-01 11:41:04
This question already has an answer here: Is floating point math broken? 31 answers today happened to me a strange thing, when I try to compile and execute the output of this code isn't what I expected. Here is the code that simply add floating values to an array of float and then print it out. The simple code: int main(){ float r[10]; int z; int i=34; for(z=0;z<10;z++){ i=z*z*z; r[z]=i; r[z]=r[z]+0.634; printf("%f\n",r[z]); } } the output: 0.634000 1.634000 8.634000 27.634001 64.634003 125.634003 216.634003 343.634003 512.633972 729.633972 note that from the 27 appears numbers after the .634

Floating point mismatch between Visual Studio 2008 and 2013

[亡魂溺海] 提交于 2019-12-01 11:34:13
After upgrading C++ project to Visual studio 2013, the result of the program has changed because of different floating point behavior of the new VC compiler. The floating model is set to /fp:precise In Visual Studio 2008(v9.0) float f = 0.4f; //it produce f = 0.400000001 float f6 = 0.400000006f; //it produce f = 0.400000001 In Visual Studio 2013 (v12.0) float f = 0.4f; //it produce f = 0.400000006 float f1 = 0.40000001f; //it produce f1 = 0.400000006 The setting for the project is identical (converted). I understand that there is a kind of liberty in floating point model, but I don't like that

Floating point addition - giving strange result..!

风流意气都作罢 提交于 2019-12-01 11:29:13
问题 When executing the following code: public class FPoint { public static void main(String[] args) { float f = 0.1f; for(int i = 0; i<9; i++) { f += 0.1f; } System.out.println(f); } } The following output is displayed: 1.0000001 But output should be 1.0000000 , right? Correct me if I'm wrong..!! 回答1: 0.1 is not really "0.1" with IEEE 754 Standard. 0.1 is coded : 0 01111011 10011001100110011001101 (with float number) 0 is the sign (= positive) 01111011 the exponent (= 123 -> 123 - 127 = -4 (127

Rules-of-thumb for minimising floating-point errors in C?

点点圈 提交于 2019-12-01 09:16:17
Regarding minimising the error in floating-point operations, if I have an operation such as the following in C: float a = 123.456; float b = 456.789; float r = 0.12345; a = a - (r * b); Will the result of the calculation change if I split the multiplication and subtraction steps out, i.e.: float c = r * b; a = a - c; I am wondering whether a CPU would then treat these calculations differently and thereby the error may be smaller in one case? If not, which I presume anyway, are there any good rules-of-thumb to mitigate against floating-point error? Can I massage data in a way that will help?

Floating point mismatch between Visual Studio 2008 and 2013

倖福魔咒の 提交于 2019-12-01 08:41:52
问题 After upgrading C++ project to Visual studio 2013, the result of the program has changed because of different floating point behavior of the new VC compiler. The floating model is set to /fp:precise In Visual Studio 2008(v9.0) float f = 0.4f; //it produce f = 0.400000001 float f6 = 0.400000006f; //it produce f = 0.400000001 In Visual Studio 2013 (v12.0) float f = 0.4f; //it produce f = 0.400000006 float f1 = 0.40000001f; //it produce f1 = 0.400000006 The setting for the project is identical

C# .Net double issue… 6.8 != 6.8?

依然范特西╮ 提交于 2019-12-01 08:15:47
问题 I was doing some unit testing at work and a peculiar error popped up for one of the assertions. Note that expectedValue and actualValue are both doubles. Assert.AreEqual(expectedValue, actualValue); The exception stated that they were not equal, elaborating that "expected value: <6.8> actual value: <6.8>." The expected value is a hard coded 6.8 and the actual value is formulated using database values going through our classification methods (such as Equal Records, or Jenks Natural Breaks). My

Guaranteed precision of sqrt function in C/C++

让人想犯罪 __ 提交于 2019-12-01 07:37:28
Everyone knows sqrt function from math.h / cmath in C/C++ - it returns square root of its argument. Of course, it has to do it with some error, because not every number can be stored precisely. But am I guaranteed that the result has some precision? For example, 'it's the best approximation of square root that can be represented in the floating point type used or if you calculate square of the result, it will be as close to initial argument as possible using the floating point type given`? Does C/C++ standard have something about it? For C99, there are no specific requirements. But most

Large numbers erroneously rounded in JavaScript

[亡魂溺海] 提交于 2019-12-01 07:26:33
问题 See this code: <html> <head> <script src="http://www.json.org/json2.js" type="text/javascript"></script> <script type="text/javascript"> var jsonString = '{"id":714341252076979033,"type":"FUZZY"}'; var jsonParsed = JSON.parse(jsonString); console.log(jsonString, jsonParsed); </script> </head> <body> </body> </html> When I see my console in Firefox 3.5, the value of jsonParsed is: Object id=714341252076979100 type=FUZZY I.e the number is rounded. Tried different values, the same outcome

printing the integral part of a floating point number

此生再无相见时 提交于 2019-12-01 07:07:47
I am trying to figure out how to print floating point numbers without using library functions. Printing the decimal part of a floating point number turned out to be quite easy. Printing the integral part is harder: static const int base = 2; static const char hex[] = "0123456789abcdef"; void print_integral_part(float value) { assert(value >= 0); char a[129]; // worst case is 128 digits for base 2 plus NUL char * p = a + 128; *p = 0; do { int digit = fmod(value, base); value /= base; assert(p > a); *--p = hex[digit]; } while (value >= 1); printf("%s", p); } Printing the integral part of FLT_MAX