precision | 易学教程

Getting the fractional part of a double value in integer without losing precision

阅读更多关于 Getting the fractional part of a double value in integer without losing precision

问题 i want to convert the fractional part of a double value with precision upto 4 digits into integer. but when i do it, i lose precision. Is there any way so that i can get the precise value? #include<stdio.h> int main() { double number; double fractional_part; int output; number = 1.1234; fractional_part = number-(int)number; fractional_part = fractional_part*10000.0; printf("%lf\n",fractional_part); output = (int)fractional_part; printf("%d\n",output); return 0; } i am expecting output to be

How to generate random double numbers with high precision in C++?

阅读更多关于 How to generate random double numbers with high precision in C++?

问题 I am trying to generate a number of series of double random numbers with high precision. For example, 0.856365621 (has 9 digits after decimal). I've found some methods from internet, however, they do generate double random number, but the precision is not as good as I request (only 6 digits after the decimal). Thus, may I know how to achieve my goal? 回答1: In C++11 you can using the <random> header and in this specific example using std::uniform_real_distribution I am able to generate random

Taking logs and adding versus multiplying

阅读更多关于 Taking logs and adding versus multiplying

问题 If I want to take the product of a list of floating point numbers, what's the worst-case/average-case precision lost by adding their logs and then taking exp of the sum as opposed to just multiplying them. Is there ever a case when this is actually more precise? 回答1: Absent any overflow or underflow shenanigans, if a and b are floating-point numbers, then the product a*b will be computed to within a relative error of 1/2 ulp. A crude bound on the relative error after multiplying a chain of N

Does a floating-point reciprocal always round-trip?

阅读更多关于 Does a floating-point reciprocal always round-trip?

问题 For IEEE-754 arithmetic, is there a guarantee of 0 or 1 units in the last place accuracy for reciprocals? From that, is there a guaranteed error-bound on the reciprocal of a reciprocal? 回答1: [Everything below assumes a fixed IEEE 754 binary format, with some form of round-to-nearest as the rounding-mode.] Since reciprocal (computed as 1/x ) is a basic arithmetic operation, 1 is exactly representable, and the arithmetic operations are guaranteed correctly rounded by the standard, the

Converting a precision double to a string

阅读更多关于 Converting a precision double to a string

问题 I have a large number in c++ stored as a precise double value (assuming the input 'n' is 75): 2.4891e+109 Is there any way to convert this to a string or an array of each individual digit? Here's my code so far, although it's not entirely relevant to the question: int main(){ double n = 0; cout << "Giz a number: "; cin >> n; double val = 1; for(double i = 1; i <= n; i++){ val = val * i; } //Convert val to string/array here? } 回答1: std::stringstream str; str << fixed << setprecision( 15 ) <<

“possible loss of precision” is Java going crazy or I'm missing something?

阅读更多关于 “possible loss of precision” is Java going crazy or I'm missing something?

问题 I'm getting a "loss of precision" error when there should be none, AFAIK. this is an instance variable: byte move=0; this happens in a method of this class: this.move=(this.move<<4)|(byte)(Guy.moven.indexOf("left")&0xF); move is a byte, move is still a byte, and the rest is being cast to a byte. I get this error: [javac] /Users/looris/Sviluppo/dumdedum/client/src/net/looris/android/toutry/Guy.java:245: possible loss of precision [javac] found : int [javac] required: byte [javac] this.move=

How do printf and scanf handle floating point precision formats?

阅读更多关于 How do printf and scanf handle floating point precision formats?

问题 Consider the following snippet of code: float val1 = 214.20; double val2 = 214.20; printf("float : %f, %4.6f, %4.2f \n", val1, val1, val1); printf("double: %f, %4.6f, %4.2f \n", val2, val2, val2); Which outputs: float : 214.199997, 214.199997, 214.20 | <- the correct value I wanted double: 214.200000, 214.200000, 214.20 | I understand that 214.20 has an infinite binary representation. The first two elements of the first line have an approximation of the intended value, but the the last one

scipy eigh gives negative eigenvalues for positive semidefinite matrix

阅读更多关于 scipy eigh gives negative eigenvalues for positive semidefinite matrix

问题 I am having some issues with scipy's eigh function returning negative eigenvalues for positive semidefinite matrices. Below is a MWE. The hess_R function returns a positive semidefinite matrix (it is the sum of a rank one matrix and a diagonal matrix, both with nonnegative entries). import numpy as np from scipy import linalg as LA def hess_R(x): d = len(x) H = np.ones(d*d).reshape(d,d) / (1 - np.sum(x))**2 H = H + np.diag(1 / (x**2)) return H.astype(np.float64) x = np.array([ 9.98510710e-02

x86 80-bit floating point type in Java

阅读更多关于 x86 80-bit floating point type in Java

问题 I want to emulate the x86 extended precision type and perform arithmetic operations and casts to other types in Java. I could try to implement it using BigDecimal, but covering all the special cases around NaNs, infinity, and casts would probably a tedious task. I am aware of some libraries that provide other floating types with a higher precision than double, but I want to have the same precision as the x86 80-bit float. Is there a Java library that provides such a floating point type? If

Calculating a round order of magnitude

阅读更多关于 Calculating a round order of magnitude

问题 For a simple project I have to make large numbers (e.g. 4294967123) readable, so I'm writing only the first digits with a prefix (4294967123 -> 4.29G, 12345 -> 12.34K etc.) The code (simplified) looks like this: const char* postfixes=" KMGT"; char postfix(unsigned int x) { return postfixes[(int) floor(log10(x))]; } It works, but I think that there's a more elegant/better solution than computing the full precision logarithm, rounding it and casting it down to an int again. Other solutions I