floating-accuracy

define double constant as hexadecimal?

∥☆過路亽.° 提交于 2020-01-03 11:24:58
问题 I would like to have the closest number below 1.0 as a floating point. By reading wikipedia's article on IEEE-754 I have managed to find out that the binary representation for 1.0 is 3FF0000000000000 , so the closest double value is actually 0x3FEFFFFFFFFFFFFF . The only way I know of to initialize a double with this binary data is this: double a; *((unsigned*)(&a) + 1) = 0x3FEFFFFF; *((unsigned*)(&a) + 0) = 0xFFFFFFFF; Which is rather cumbersome to use. Is there any better way to define this

define double constant as hexadecimal?

点点圈 提交于 2020-01-03 11:24:15
问题 I would like to have the closest number below 1.0 as a floating point. By reading wikipedia's article on IEEE-754 I have managed to find out that the binary representation for 1.0 is 3FF0000000000000 , so the closest double value is actually 0x3FEFFFFFFFFFFFFF . The only way I know of to initialize a double with this binary data is this: double a; *((unsigned*)(&a) + 1) = 0x3FEFFFFF; *((unsigned*)(&a) + 0) = 0xFFFFFFFF; Which is rather cumbersome to use. Is there any better way to define this

Dividing a double with integer

↘锁芯ラ 提交于 2020-01-03 05:43:05
问题 I am facing an issue while dividing a double with an int . Code snippet is : double db = 10; int fac = 100; double res = db / fac; The value of res is 0.10000000000000001 instead of 0.10 . Does anyone know what is the reason for this? I am using cc to compile the code. 回答1: You need to read the classic paper What Every Computer Scientist Should Know About Floating-Point Arithmetic. 回答2: The CPU uses binary representation of numbers. Your result cannot be represented exactly in binary. 0.1 in

Does floating point sqrt() function guarantee order relation

橙三吉。 提交于 2020-01-03 02:23:06
问题 given two floating point number x and y, suppose all floating point arithmetic conforming the IEEE754 standard, and a certain implementation of square root function sqrt(), if x < y, is it true that sqrt(x) <= sqrt(y) must hold? if sqrt(x) < sqrt(y), is it true that x <= y must hold? Let a, b are two (precise) real number, and x = op(a), y = op(b), where op() denotes rounding a real number to its floating point representation. Then the following question: (* means floating point

Formatting floating-point numbers without loss of precision in AngularJS

孤街醉人 提交于 2020-01-02 07:25:16
问题 In AngularJS how do I output a floating point number on an HTML page without loss of precision and without unnecessary padding with 0's? I've considered the "number" ng-filter (https://docs.angularjs.org/api/ng/filter/number) but the fractionSize parameter causes a fixed number of decimals: {{ number_expression | number : fractionSize}} I'm looking for what in various other languages is referred to as "exact reproducibility", "canonical string representation", repr, round-trip, etc. but I

Alternative to C++11's std::nextafter and std::nexttoward for C++03?

会有一股神秘感。 提交于 2020-01-02 07:10:11
问题 As the title says, the functionality I'm after is provided by C++11's math libraries to find the next floating point value towards a particular value. Aside from pulling the code out of the std library (which I may have to resort to), any alternatives to do this with C++03 (using GCC 4.4.6)? 回答1: Platform dependently, assuming IEEE754, and modulo endianness, you can store the data of the floating point number in an integer, increment by one, and retrieve the result: float input = 3.15; uint32

Change in Python built in round() function between 2.4 and 2.7

做~自己de王妃 提交于 2020-01-02 05:48:05
问题 Has the built in round() function in Python changed between 2.4 and 2.7? Python 2.4: Python 2.4.6 (#1, Feb 12 2009, 14:52:44) [GCC 3.4.6 20060404 (Red Hat 3.4.6-8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f = 1480.39499999999998181010596454143524169921875 >>> round(f,2) 1480.4000000000001 >>> Python 2.7: Python 2.7.1 (r271:86832, May 13 2011, 08:14:41) [GCC 3.4.6 20060404 (Red Hat 3.4.6-11)] on linux2 Type "help", "copyright", "credits" or "license

Minimize floating point error when adding multiple floating point variables

巧了我就是萌 提交于 2020-01-01 10:16:14
问题 In my c++ app i have a vector of doubles in the range (0,1) and i have to calculate its total as accurately as possible. It feels like this issue should have been addressed before, but i cant find anything. Obviously iterating through each item on the vector and doing sum+=vect[i] accumulates a significant error if the vector size is large and there are items which are significantly smaller then the others. My current solution is this function: double sumDoubles(vector<double> arg)// pass by

pow() function in C problems [duplicate]

谁说胖子不能爱 提交于 2019-12-31 06:49:05
问题 This question already has answers here : Strange behaviour of the pow function (5 answers) Closed last year . I am having some problems with pow() function in C. When ever run this code, 153 as input, the sum evaluates to 152 . However if I dont use pow() function and instead use a for loop to get the value of N n , the sum evaluates to 153 . Can anyone help please explain me this difference? #include <stdio.h> #include <string.h> #include <stdlib.h> #include <math.h> int main(void) {

Dividing and multiplying Decimal objects in Python

非 Y 不嫁゛ 提交于 2019-12-31 05:38:14
问题 In the following code, both coeff1 and coeff2 are Decimal objects. When i check their type using type(coeff1), i get (class 'decimal.Decimal') but when i made a test code and checked decimal objects i get decimal. Decimal, without the word class coeff1 = system[i].normal_vector.coordinates[i] coeff2 = system[m].normal_vector.coordinates[i] x = coeff2/coeff1 print(type(x)) system.xrow_add_to_row(x,i,m) another issue is when i change the first input to the function xrow_add_to_row to negative x