floating-accuracy | 易学教程

Why 4.1%2 returns 0.0999999999999996 using Ruby?But 4.2%2==0.2

阅读更多关于 Why 4.1%2 returns 0.0999999999999996 using Ruby?But 4.2%2==0.2

问题 Why 4.1%2 returns 0.0999999999999996?But 4.2%2==0.2. 回答1: See here: What Every Programmer Should Know About Floating-Point Arithmetic Real numbers are infinite. Computers are working with a finite number of bits (32 bits, 64 bits today). As a result floating-point arithmetic done by computers cannot represent all the real numbers. 0.1 is one of these numbers. Note that is not an issue related to Ruby, but to all programming languages because it comes from the way computers represent real

Can a subtraction between two exactly represented floating point numbers with the same floating point be inexact?

阅读更多关于 Can a subtraction between two exactly represented floating point numbers with the same floating point be inexact?

问题 I have 2 numbers, x and y, that are known and are represented exactly as floating point numbers. I want to know if z = x - y is always exact or if rounding errors can occur. For simple examples it's obvious: x = 0.75 = (1 + 0.5) * 2^-1 y = 0.5 = 1 * 2^-1 z = x - y = 0.25 = 0.5 * 2^-1 = 1 * 2^-2 But what if I have x and y such that all significant digits are used and they have the same exponent? My intuition tells me the result should be exact, but I would like to see some kind of proof for

What is the most efficient way to round a float value to the nearest integer in java?

阅读更多关于 What is the most efficient way to round a float value to the nearest integer in java?

问题 I've seen a lot of discussion on SO related to rounding float values, but no solid Q&A considering the efficiency aspect. So here it is: What is the most efficient (but correct) way to round a float value to the nearest integer? (int) (mFloat + 0.5); or Math.round(mFloat); or FloatMath.floor(mFloat + 0.5); or something else? Preferably I would like to use something available in standard java libraries, not some external library that I have to import. 回答1: public class Main { public static

What's the benefit of accepting floating point inaccuracy in c#

阅读更多关于 What's the benefit of accepting floating point inaccuracy in c#

问题 I've had this problem on my mind the last few days, and I'm struggling to phrase my question. However, I think I've nailed what I want to know. Why does c# accept the inaccuracy by using floating points to store data? And what's the benefit of using it over other methods? For example, Math.Pow(Math.Sqrt(2),2) is not exact in c#. There are programming languages that can calculate it exactly (for example, Mathematica). One argument I could think of is that calculating it exactly is a lot slower

Can a calculation of floating point differ on different processors? (+passing doubles between C# and C)

阅读更多关于 Can a calculation of floating point differ on different processors? (+passing doubles between C# and C)

问题 I have an application written in C# that invokes some C code as well. The C# code gets some double as an input, performs some calculations on it, pass it to the native layer that perform its own calculations on it, and then passes back to the C# layer. If i run the same exe/dlls on different machines (all of them are x64 by Intel), is it possible that the final result i'll get will be different on different machines? 回答1: If you use the same executable(s) the results should be the same.

Efficiently computing (a - K) / (a + K) with improved accuracy

阅读更多关于 Efficiently computing (a - K) / (a + K) with improved accuracy

问题 In various contexts, for example for the argument reduction for mathematical functions, one needs to compute (a - K) / (a + K) , where a is a positive variable argument and K is a constant. In many cases, K is a power of two, which is the use case relevant to my work. I am looking for efficient ways to compute this quotient more accurately than can be accomplished with the straightforward division. Hardware support for fused multiply-add (FMA) can be assumed, as this operation is provided by

Accurate computation of scaled complementary error function, erfcx()

阅读更多关于 Accurate computation of scaled complementary error function, erfcx()

问题 The (exponentially) scaled complementary error function, commonly designated by erfcx , is defined mathematically as erfcx(x) := e x 2 erfc(x). It frequently occurs in diffusion problems in physics as well as chemistry. While some mathematical environments, such as MATLAB and GNU Octave, provide this function, it is absent from the C standard math library, which only provides erf() and erfc() . While it is possible to implement one's own erfcx() based directly on the mathematical definition,

Why would a variable of type double have an unexpected result?

阅读更多关于 Why would a variable of type double have an unexpected result?

问题 My sanity check fails because a double variable does not contain the expected result, it's really bizarre. double a = 1117.54 + 8561.64 + 13197.37; double b = 22876.55; Console.WriteLine("{0} == {1}: {2}", a, b, a == b); Gives us this output: 22876.55 == 22876.55: False Further inspection shows us that variable a, in fact, contains the value 22876.550000000003. This is reproducible in vb.net as well. Am I sane ? What is going on? 回答1: Floating point types are not always capable of accurately

Float24 (24 bit floating point) to Hex?

阅读更多关于 Float24 (24 bit floating point) to Hex?

问题 I'm using float 24 bit to store a floating point value in a compiler MRK III from NXP. It stores the 24 bit float value as 3 byte Hex in Data memory. Now when I'm using IEEE 754 float point conversion to retrieve the number back from binary to real, I'm getting something very strange. Let me put it this way with an example - Note - "since my compiler supports float 24 bit (along with float 32), I'm assigning value something like this." Sample Program : float24 f24test; float f32test; f32test=

How to avoid floating point arithmetics issues?

阅读更多关于 How to avoid floating point arithmetics issues?

问题 Python (and almost anything else) has known limitations while working with floating point numbers (nice overview provided here). While problem is described well in the documentation it avoids providing any approach to fixing it. And with this question I am seeking to find a more or less robust way to avoid situations like the following: print(math.floor(0.09/0.015)) # >> 6 print(math.floor(0.009/0.0015)) # >> 5 print(99.99-99.973) # >> 0.016999999999825377 print(.99-.973) # >> 0