floating-point-precision

numpy matrix inversion rounding errors

╄→尐↘猪︶ㄣ 提交于 2019-11-26 23:43:41
问题 I am getting a very strange value for my (1,1) entry for my BinvA matrix I am just trying to invert B matrix and do a (B^-1)A multiplication. I understand that when I do the calculation by hand my (1,1) is supposed to be 0 but instead I get 1.11022302e-16. How can I fix it? I know floating point numbers can't be represented to full accuracy but why is this giving me such an inaccurate response and not rounding to 0 is there any way I can make it more accurate? Her is my code: import numpy as

Get next smallest Double number

笑着哭i 提交于 2019-11-26 21:59:57
问题 As part of a unit test, I need to test some boundary conditions. One method accepts a System.Double argument. Is there a way to get the next-smallest double value? (i.e. decrement the mantissa by 1 unit-value)? I considered using Double.Epsilon but this is unreliable as it's only the smallest delta from zero, and so doesn't work for larger values (i.e. 9999999999 - Double.Epsilon == 9999999999 ). So what is the algorithm or code needed such that: NextSmallest(Double d) < d ...is always true.

What's the C++ suffix for long double literals?

痴心易碎 提交于 2019-11-26 20:05:00
问题 In C++ (and C), a floating point literal without suffix defaults to double , while the suffix f implies a float . But what is the suffix to get a long double ? Without knowing, I would define, say, const long double x = 3.14159265358979323846264338328; But my worry is that the variable x contains fewer significant bits of 3.14159265358979323846264338328 than 64, because this is a double literal. Is this worry justified? 回答1: From the C++ Standard The type of a floating literal is double

next higher/lower IEEE double precision number

对着背影说爱祢 提交于 2019-11-26 18:24:09
问题 I am doing high precision scientific computations. In looking for the best representation of various effects, I keep coming up with reasons to want to get the next higher (or lower) double precision number available. Essentially, what I want to do is add one to the least significant bit in the internal representation of a double. The difficulty is that the IEEE format is not totally uniform. If one were to use low-level code and actually add one to the least significant bit, the resulting

Correct use of std::cout.precision() - not printing trailing zeros

牧云@^-^@ 提交于 2019-11-26 17:50:49
I see many questions about the precision number for floating point numbers but specifically I want to know why this code #include <iostream> #include <stdlib.h> int main() { int a = 5; int b = 10; std::cout.precision(4); std::cout << (float)a/(float)b << "\n"; return 0; } shows 0.5 ? I expect to see 0.5000 . Is it because of the original integer data types? #include <iostream> #include <stdlib.h> #include <iomanip> int main() { int a = 5; int b = 10; std::cout << std::fixed; std::cout << std::setprecision(4); std::cout << (float)a/(float)b << "\n"; return 0; } You need to pass std::fixed

Is the most significant decimal digits precision that can be converted to binary and back to decimal without loss of significance 6 or 7.225?

泪湿孤枕 提交于 2019-11-26 17:46:11
问题 I've come across two different precision formulas for floating-point numbers. ⌊(N-1) log 10 (2)⌋ = 6 decimal digits (Single-precision) and N log 10 (2) ≈ 7.225 decimal digits (Single-precision) Where N = 24 Significant bits (Single-precision) The first formula is found at the top of page 4 of "IEEE Standard 754 for Binary Floating-Point Arithmetic" written by, Professor W. Kahan . The second formula is found on the Wikipedia article "Single-precision floating-point format" under section IEEE

Set specific precision of a BigDecimal

自闭症网瘾萝莉.ら 提交于 2019-11-26 16:38:50
问题 I have an XSD that requires me to use a BigDecimal for a lat/lon. I currently have the lat/lon as doubles, and convert them to BigDecimal, but I am only required to use about 12 places of precision. I have not been able to figure out how to set that. Can anyone help me with this? 回答1: The title of the question asks about precision. BigDecimal distinguishes between scale and precision. Scale is the number of decimal places. You can think of precision as the number of significant figures, also

pow() seems to be out by one here

∥☆過路亽.° 提交于 2019-11-26 14:11:08
问题 What's going on here: #include <stdio.h> #include <math.h> int main(void) { printf("17^12 = %lf\n", pow(17, 12)); printf("17^13 = %lf\n", pow(17, 13)); printf("17^14 = %lf\n", pow(17, 14)); } I get this output: 17^12 = 582622237229761.000000 17^13 = 9904578032905936.000000 17^14 = 168377826559400928.000000 13 and 14 do not match with wolfram alpa cf: 12: 582622237229761.000000 582622237229761 13: 9904578032905936.000000 9904578032905937 14: 168377826559400928.000000 168377826559400929

Trouble with floats in Objective-C

心不动则不痛 提交于 2019-11-26 12:33:23
I've a small problem and I can't find a solution! My code is (this is only a sample code, but my original code do something like this): float x = [@"2.45" floatValue]; for(int i=0; i<100; i++) x += 0.22; NSLog(@"%f", x); the output is 52.450001 and not 52.450000 ! I don't know because this happens! Thanks for any help! ~SOLVED~ Thanks to everybody! Yes, I've solved with the double type! Ralph M. Rickenbach Floats are a number representation with a certain precision. Not every value can be represented in this format. See here as well. You can easily think of why this would be the case: there is

Exactly storing large integers

笑着哭i 提交于 2019-11-26 07:49:45
问题 In R software a <- 123456789123456789123456789 sprintf(\"%27f\",a) #[1] \"123456789123456791337762816.000000\" I got the wrong answer. I want exact a value. Why is the system showing the wrong value of a ? 回答1: The reason you're not getting your exact value of a is that R is storing it as a double instead of as an integer. Because a is very large, there is some rounding that takes place when you assign a . Normally to store things as integers you would use L at the end of the numbers;