precision

Decimal places in SQL

安稳与你 提交于 2019-12-10 15:31:12
问题 I am calculating percentages. One example is coming down to 38589/38400 So the percentage is 100*(38589/38400) which equals something like 100.4921875, but the result shows up as 100. How can I get it to be displayed with x number of decimals? Similarly, will the same work if i'd like 2 to be displayed as 2.000000? Thanks! 回答1: You can cast it to a specific data type, which preserves the data type as well as rounding to a certain precision select cast(100*(38589/38400) as decimal(10,4)) FYI

Why does AVX512-IFMA support only 52-bit ints?

只谈情不闲聊 提交于 2019-12-10 15:17:39
问题 From the value we can infer that it uses the same components as double-precision floating-point hardware. But double has 53 bits of mantissa, so why is AVX512-IFMA limited to 52 bits? 回答1: IEEE-754 double precision actually only has 52 explicitly stored bits, the 53rd bit (the most significant bit) is an implicit 1. 来源: https://stackoverflow.com/questions/28862012/why-does-avx512-ifma-support-only-52-bit-ints

output to stream float numbers with precision

折月煮酒 提交于 2019-12-10 14:56:04
问题 I have a problem with float numbers precision: int main(void) { double b = 106.829599; float a = b; std::cerr << std::setprecision(6) << "a = " << a << "; b = " << b << std::endl; std::cerr << std::setprecision(7) << "a = " << a << "; b = " << b << std::endl; } result is: a = 106.83; b = 106.83 a = 106.8296; b = 106.8296 So, my question is why numbers in first line are so short (I was expecting to see 106.829) gcc 4.1.2, also I made a test at LWS 回答1: Actually, 106.829599 rounded to 6 digits

tf.round() to a specified precision

吃可爱长大的小学妹 提交于 2019-12-10 14:23:07
问题 tf.round(x) rounds the values of x to integer values. Is there any way to round to, say, 3 decimal places instead? 回答1: You can do it easily like that, if you don't risk reaching too high numbers: def my_tf_round(x, decimals = 0): multiplier = tf.constant(10**decimals, dtype=x.dtype) return tf.round(x * multiplier) / multiplier Mention: The value of x * multiplier should not exceed 2^32. So using the above method, should not rounds too high numbers. 来源: https://stackoverflow.com/questions

Better approximation of e with Java

吃可爱长大的小学妹 提交于 2019-12-10 14:17:26
问题 I would like to approximate the value of e to any desired precision. What is the best way to do this? The most I've been able to get is e = 2.7182818284590455. Any examples on a modification of the following code would be appreciated. public static long fact(int x){ long prod = 1; for(int i = 1; i <= x; i++) prod = prod * i; return prod; }//fact public static void main(String[] args) { double e = 1; for(int i = 1; i < 50; i++) e = e + 1/(double)(fact(i)); System.out.print("e = " + e); }//main

Can float be round tripped via double without losing precision?

白昼怎懂夜的黑 提交于 2019-12-10 14:16:28
问题 If I have a C# float , can I convert it to double without losing any precision? If that double were converted back to float , would it have exactly the same value? 回答1: Yes. IEEE754 floating point (which is what C# must use) guarantees this: Converting a float to a double preserves exactly the same value Converting that double back to a float recovers exactly that original float . The set of double s is a superset of float s. Note that this also applies to NaN , +Infinity , and -Infinity .

Typecasting std::complex<double> to __complex128

久未见 提交于 2019-12-10 13:49:08
问题 I'm trying to use the quadmath library in GCC. I have a complex double value I'd like to typecast into the corresponding quad precision complex number, __complex128 . The following is a minimal (non)-working example: #include <quadmath.h> #include <complex> #include <stdio.h> using namespace std::complex_literals; int main(){ std::complex<double> x = 1 + 2i; std::printf("x = %5.5g + %5.5g\n", x.real(), x.imag()); __complex128 y = 2+2i; y = x; return 0; } When I try compiling this code with g+

Compare a 32 bit float and a 32 bit integer without casting to double, when either value could be too large to fit the other type exactly

。_饼干妹妹 提交于 2019-12-10 13:33:30
问题 I have a 32 bit floating point f number (known to be positive) that I need to convert to 32 bit unsigned integer. It's magnitude might be too large to fit. Furthermore, there is downstream computation that requires some headroom. I can compute the maximum acceptable value m as a 32 bit integer. How do I efficiently determine in C++11 on a constrained 32 bit machine (ARM M4F) if f <= m mathematically. Note that the types of the two values don't match. The following three approaches each have

How to use “%f” to populate a double value into a string with the right precision

时光总嘲笑我的痴心妄想 提交于 2019-12-10 13:25:23
问题 I am trying to populate a string with a double value using a sprintf like this: sprintf(S, "%f", val); But the precision is being cut off to six decimal places. I need about 10 decimal places for the precision. How can that be achieved? 回答1: %[width].[precision] Width should include the decimal point. %8.2 means 8 characters wide; 5 digits before the point and 2 after. One character is reserved for the point. 5 + 1 + 2 = 8 回答2: What you want is a modifier: sprintf(S, "%.10f", val); man

Interpreting a 32bit unsigned long as Single Precision IEEE-754 Float in C

試著忘記壹切 提交于 2019-12-10 12:46:50
问题 I am using the XC32 compiler from Microchip, which is based on the standard C compiler. I am reading a 32bit value from a device on a RS485 network and storing this in a unsigned long that I have typedef'ed as DWORD. i.e. typedef DWORD unsigned long; As it stands, when I typecast this value to a float, the value I get is basically the floating point version of it's integer representation and not the proper IEEE-754 interpreted float. i.e. DWORD dword_value = readValueOnRS485(); float temp =