floating-accuracy | 易学教程

Why does 0.1 + 0.4 = 0.5?

阅读更多关于 Why does 0.1 + 0.4 = 0.5?

问题 We know that floating point is broken, because decimal numbers can't always be perfectly represented in binary. They're rounded to a number that can be represented in binary; sometimes that number is higher, and sometimes it's lower. In this case using the ubiquitous IEEE 754 double format both 0.1 and 0.4 round higher: 0.1 = 0.1000000000000000055511151231257827021181583404541015625 0.4 = 0.40000000000000002220446049250313080847263336181640625 Since both of these numbers are high, you'd

Why does (int)(33.46639 * 1000000) return 33466389?

阅读更多关于 Why does (int)(33.46639 * 1000000) return 33466389?

问题 (int)(33.46639 * 1000000) returns 33466389 Why does this happen? 回答1: Floating point math isn't perfect. What every programmer should know about it. Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point

Can a subtraction between two exactly represented floating point numbers with the same floating point be inexact?

阅读更多关于 Can a subtraction between two exactly represented floating point numbers with the same floating point be inexact?

I have 2 numbers, x and y, that are known and are represented exactly as floating point numbers. I want to know if z = x - y is always exact or if rounding errors can occur. For simple examples it's obvious: x = 0.75 = (1 + 0.5) * 2^-1 y = 0.5 = 1 * 2^-1 z = x - y = 0.25 = 0.5 * 2^-1 = 1 * 2^-2 But what if I have x and y such that all significant digits are used and they have the same exponent? My intuition tells me the result should be exact, but I would like to see some kind of proof for this. Is it different if the result is negative? I am assuming that you want the two numbers to have the

What is the most efficient way to round a float value to the nearest integer in java?

阅读更多关于 What is the most efficient way to round a float value to the nearest integer in java?

I've seen a lot of discussion on SO related to rounding float values, but no solid Q&A considering the efficiency aspect. So here it is: What is the most efficient (but correct) way to round a float value to the nearest integer? (int) (mFloat + 0.5); or Math.round(mFloat); or FloatMath.floor(mFloat + 0.5); or something else? Preferably I would like to use something available in standard java libraries, not some external library that I have to import. public class Main { public static void main(String[] args) throws InterruptedException { for (int i = 0; i < 10; i++) { measurementIteration(); }

IEEE 754: How exactly does it work?

阅读更多关于 IEEE 754: How exactly does it work?

Why does the following code behave as it does in C? float x = 2147483647; //2^31 printf("%f\n", x); //Outputs 2147483648 Here is my thought process: 2147483647 = 0 1001 1101 1111 1111 1111 1111 1111 111 (0.11111111111111111111111)base2 = (1-(0.5)^23)base10 => (1.11111111111111111111111)base2 = (1 + 1-(0.5)^23)base10 = (1.99999988)base10 Therefore, to convert the IEEE 754 notation back to decimal: 1.99999988 * 2^30 = 2147483520 So technically, the C program must have printed out 2147483520, right? The value to be represented would be 2147483647. the next two values which can be represented this

Java float is more precise than double?

阅读更多关于 Java float is more precise than double?

Code: class Main { public static void main (String[] args) { System.out.print("float: "); System.out.println(1.35f-0.00026f); System.out.print("double: "); System.out.println(1.35-0.00026); } } Output: float: 1.34974 double: 1.3497400000000002 ??? float got the right answer, but double is adding extra stuff from no where, Why?? Isn't double supposed to be more precise than float? A float is 4 bytes wide, whereas a double is 8 bytes wide. Check What Every Computer Scientist Should Know About Floating-Point Arithmetic Surely the double has more precision so it has slightly less rounding error.

What's the benefit of accepting floating point inaccuracy in c#

阅读更多关于 What's the benefit of accepting floating point inaccuracy in c#

I've had this problem on my mind the last few days, and I'm struggling to phrase my question. However, I think I've nailed what I want to know. Why does c# accept the inaccuracy by using floating points to store data? And what's the benefit of using it over other methods? For example, Math.Pow(Math.Sqrt(2),2) is not exact in c#. There are programming languages that can calculate it exactly (for example, Mathematica). One argument I could think of is that calculating it exactly is a lot slower then just coping with the inaccuracy, but Mathematica & Matlab are used to calculate gigantic

Is it possible to get 0 by subtracting two unequal floating point numbers?

阅读更多关于 Is it possible to get 0 by subtracting two unequal floating point numbers?

问题 Is it possible to get division by 0 (or infinity) in the following example? public double calculation(double a, double b) { if (a == b) { return 0; } else { return 2 / (a - b); } } In normal cases it will not, of course. But what if a and b are very close, can (a-b) result in being 0 due to precision of the calculation? Note that this question is for Java, but I think it will apply to most programming languages. 回答1: In Java, a - b is never equal to 0 if a != b . This is because Java mandates

Minimize floating point error when adding multiple floating point variables

阅读更多关于 Minimize floating point error when adding multiple floating point variables

In my c++ app i have a vector of doubles in the range (0,1) and i have to calculate its total as accurately as possible. It feels like this issue should have been addressed before, but i cant find anything. Obviously iterating through each item on the vector and doing sum+=vect[i] accumulates a significant error if the vector size is large and there are items which are significantly smaller then the others. My current solution is this function: double sumDoubles(vector<double> arg)// pass by copy { sort(arg.rbegin(),arg.rend()); // sort in reverse order for(int i=1;i<=arg.size();i*=2) for(int

Can a calculation of floating point differ on different processors? (+passing doubles between C# and C)

阅读更多关于 Can a calculation of floating point differ on different processors? (+passing doubles between C# and C)

I have an application written in C# that invokes some C code as well. The C# code gets some double as an input, performs some calculations on it, pass it to the native layer that perform its own calculations on it, and then passes back to the C# layer. If i run the same exe/dlls on different machines (all of them are x64 by Intel), is it possible that the final result i'll get will be different on different machines? If you use the same executable(s) the results should be the same. However, it is worth noting that floating-point calculations are usually highly customizable by a number of

订阅 floating-accuracy