floating-accuracy | 易学教程

define double constant as hexadecimal?

阅读更多关于 define double constant as hexadecimal?

问题 I would like to have the closest number below 1.0 as a floating point. By reading wikipedia's article on IEEE-754 I have managed to find out that the binary representation for 1.0 is 3FF0000000000000 , so the closest double value is actually 0x3FEFFFFFFFFFFFFF . The only way I know of to initialize a double with this binary data is this: double a; *((unsigned*)(&a) + 1) = 0x3FEFFFFF; *((unsigned*)(&a) + 0) = 0xFFFFFFFF; Which is rather cumbersome to use. Is there any better way to define this

define double constant as hexadecimal?

阅读更多关于 define double constant as hexadecimal?

Dividing a double with integer

阅读更多关于 Dividing a double with integer

问题 I am facing an issue while dividing a double with an int . Code snippet is : double db = 10; int fac = 100; double res = db / fac; The value of res is 0.10000000000000001 instead of 0.10 . Does anyone know what is the reason for this? I am using cc to compile the code. 回答1: You need to read the classic paper What Every Computer Scientist Should Know About Floating-Point Arithmetic. 回答2: The CPU uses binary representation of numbers. Your result cannot be represented exactly in binary. 0.1 in

Does floating point sqrt() function guarantee order relation

阅读更多关于 Does floating point sqrt() function guarantee order relation

问题 given two floating point number x and y, suppose all floating point arithmetic conforming the IEEE754 standard, and a certain implementation of square root function sqrt(), if x < y, is it true that sqrt(x) <= sqrt(y) must hold? if sqrt(x) < sqrt(y), is it true that x <= y must hold? Let a, b are two (precise) real number, and x = op(a), y = op(b), where op() denotes rounding a real number to its floating point representation. Then the following question: (* means floating point

Formatting floating-point numbers without loss of precision in AngularJS

阅读更多关于 Formatting floating-point numbers without loss of precision in AngularJS

问题 In AngularJS how do I output a floating point number on an HTML page without loss of precision and without unnecessary padding with 0's? I've considered the "number" ng-filter (https://docs.angularjs.org/api/ng/filter/number) but the fractionSize parameter causes a fixed number of decimals: {{ number_expression | number : fractionSize}} I'm looking for what in various other languages is referred to as "exact reproducibility", "canonical string representation", repr, round-trip, etc. but I

Alternative to C++11's std::nextafter and std::nexttoward for C++03?

阅读更多关于 Alternative to C++11's std::nextafter and std::nexttoward for C++03?

问题 As the title says, the functionality I'm after is provided by C++11's math libraries to find the next floating point value towards a particular value. Aside from pulling the code out of the std library (which I may have to resort to), any alternatives to do this with C++03 (using GCC 4.4.6)? 回答1: Platform dependently, assuming IEEE754, and modulo endianness, you can store the data of the floating point number in an integer, increment by one, and retrieve the result: float input = 3.15; uint32

Change in Python built in round() function between 2.4 and 2.7

阅读更多关于 Change in Python built in round() function between 2.4 and 2.7

问题 Has the built in round() function in Python changed between 2.4 and 2.7? Python 2.4: Python 2.4.6 (#1, Feb 12 2009, 14:52:44) [GCC 3.4.6 20060404 (Red Hat 3.4.6-8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f = 1480.39499999999998181010596454143524169921875 >>> round(f,2) 1480.4000000000001 >>> Python 2.7: Python 2.7.1 (r271:86832, May 13 2011, 08:14:41) [GCC 3.4.6 20060404 (Red Hat 3.4.6-11)] on linux2 Type "help", "copyright", "credits" or "license

Minimize floating point error when adding multiple floating point variables

阅读更多关于 Minimize floating point error when adding multiple floating point variables

问题 In my c++ app i have a vector of doubles in the range (0,1) and i have to calculate its total as accurately as possible. It feels like this issue should have been addressed before, but i cant find anything. Obviously iterating through each item on the vector and doing sum+=vect[i] accumulates a significant error if the vector size is large and there are items which are significantly smaller then the others. My current solution is this function: double sumDoubles(vector<double> arg)// pass by

pow() function in C problems [duplicate]

阅读更多关于 pow() function in C problems [duplicate]

问题 This question already has answers here : Strange behaviour of the pow function (5 answers) Closed last year . I am having some problems with pow() function in C. When ever run this code, 153 as input, the sum evaluates to 152 . However if I dont use pow() function and instead use a for loop to get the value of N n , the sum evaluates to 153 . Can anyone help please explain me this difference? #include <stdio.h> #include <string.h> #include <stdlib.h> #include <math.h> int main(void) {

Dividing and multiplying Decimal objects in Python

阅读更多关于 Dividing and multiplying Decimal objects in Python

问题 In the following code, both coeff1 and coeff2 are Decimal objects. When i check their type using type(coeff1), i get (class 'decimal.Decimal') but when i made a test code and checked decimal objects i get decimal. Decimal, without the word class coeff1 = system[i].normal_vector.coordinates[i] coeff2 = system[m].normal_vector.coordinates[i] x = coeff2/coeff1 print(type(x)) system.xrow_add_to_row(x,i,m) another issue is when i change the first input to the function xrow_add_to_row to negative x