floating-accuracy

Why 4.1%2 returns 0.0999999999999996 using Ruby?But 4.2%2==0.2

空扰寡人 提交于 2019-12-22 06:47:07
问题 Why 4.1%2 returns 0.0999999999999996?But 4.2%2==0.2. 回答1: See here: What Every Programmer Should Know About Floating-Point Arithmetic Real numbers are infinite. Computers are working with a finite number of bits (32 bits, 64 bits today). As a result floating-point arithmetic done by computers cannot represent all the real numbers. 0.1 is one of these numbers. Note that is not an issue related to Ruby, but to all programming languages because it comes from the way computers represent real

Can a subtraction between two exactly represented floating point numbers with the same floating point be inexact?

流过昼夜 提交于 2019-12-21 23:07:05
问题 I have 2 numbers, x and y, that are known and are represented exactly as floating point numbers. I want to know if z = x - y is always exact or if rounding errors can occur. For simple examples it's obvious: x = 0.75 = (1 + 0.5) * 2^-1 y = 0.5 = 1 * 2^-1 z = x - y = 0.25 = 0.5 * 2^-1 = 1 * 2^-2 But what if I have x and y such that all significant digits are used and they have the same exponent? My intuition tells me the result should be exact, but I would like to see some kind of proof for

What is the most efficient way to round a float value to the nearest integer in java?

心不动则不痛 提交于 2019-12-21 20:46:36
问题 I've seen a lot of discussion on SO related to rounding float values, but no solid Q&A considering the efficiency aspect. So here it is: What is the most efficient (but correct) way to round a float value to the nearest integer? (int) (mFloat + 0.5); or Math.round(mFloat); or FloatMath.floor(mFloat + 0.5); or something else? Preferably I would like to use something available in standard java libraries, not some external library that I have to import. 回答1: public class Main { public static

What's the benefit of accepting floating point inaccuracy in c#

ぐ巨炮叔叔 提交于 2019-12-21 16:49:45
问题 I've had this problem on my mind the last few days, and I'm struggling to phrase my question. However, I think I've nailed what I want to know. Why does c# accept the inaccuracy by using floating points to store data? And what's the benefit of using it over other methods? For example, Math.Pow(Math.Sqrt(2),2) is not exact in c#. There are programming languages that can calculate it exactly (for example, Mathematica). One argument I could think of is that calculating it exactly is a lot slower

Can a calculation of floating point differ on different processors? (+passing doubles between C# and C)

时间秒杀一切 提交于 2019-12-21 12:39:47
问题 I have an application written in C# that invokes some C code as well. The C# code gets some double as an input, performs some calculations on it, pass it to the native layer that perform its own calculations on it, and then passes back to the C# layer. If i run the same exe/dlls on different machines (all of them are x64 by Intel), is it possible that the final result i'll get will be different on different machines? 回答1: If you use the same executable(s) the results should be the same.

Efficiently computing (a - K) / (a + K) with improved accuracy

有些话、适合烂在心里 提交于 2019-12-21 06:55:54
问题 In various contexts, for example for the argument reduction for mathematical functions, one needs to compute (a - K) / (a + K) , where a is a positive variable argument and K is a constant. In many cases, K is a power of two, which is the use case relevant to my work. I am looking for efficient ways to compute this quotient more accurately than can be accomplished with the straightforward division. Hardware support for fused multiply-add (FMA) can be assumed, as this operation is provided by

Accurate computation of scaled complementary error function, erfcx()

拟墨画扇 提交于 2019-12-21 05:03:03
问题 The (exponentially) scaled complementary error function, commonly designated by erfcx , is defined mathematically as erfcx(x) := e x 2 erfc(x). It frequently occurs in diffusion problems in physics as well as chemistry. While some mathematical environments, such as MATLAB and GNU Octave, provide this function, it is absent from the C standard math library, which only provides erf() and erfc() . While it is possible to implement one's own erfcx() based directly on the mathematical definition,

Why would a variable of type double have an unexpected result?

蓝咒 提交于 2019-12-20 05:29:07
问题 My sanity check fails because a double variable does not contain the expected result, it's really bizarre. double a = 1117.54 + 8561.64 + 13197.37; double b = 22876.55; Console.WriteLine("{0} == {1}: {2}", a, b, a == b); Gives us this output: 22876.55 == 22876.55: False Further inspection shows us that variable a, in fact, contains the value 22876.550000000003. This is reproducible in vb.net as well. Am I sane ? What is going on? 回答1: Floating point types are not always capable of accurately

Float24 (24 bit floating point) to Hex?

喜夏-厌秋 提交于 2019-12-20 05:27:47
问题 I'm using float 24 bit to store a floating point value in a compiler MRK III from NXP. It stores the 24 bit float value as 3 byte Hex in Data memory. Now when I'm using IEEE 754 float point conversion to retrieve the number back from binary to real, I'm getting something very strange. Let me put it this way with an example - Note - "since my compiler supports float 24 bit (along with float 32), I'm assigning value something like this." Sample Program : float24 f24test; float f32test; f32test=

How to avoid floating point arithmetics issues?

∥☆過路亽.° 提交于 2019-12-20 05:23:08
问题 Python (and almost anything else) has known limitations while working with floating point numbers (nice overview provided here). While problem is described well in the documentation it avoids providing any approach to fixing it. And with this question I am seeking to find a more or less robust way to avoid situations like the following: print(math.floor(0.09/0.015)) # >> 6 print(math.floor(0.009/0.0015)) # >> 5 print(99.99-99.973) # >> 0.016999999999825377 print(.99-.973) # >> 0