Create a range of decimal number using numpy does not work

为君一笑 提交于 2021-01-29 05:01:48

问题


I want to create a range of values between 521 and 522 with step 0.1. This is my code:

ICD9CD1 = np.arange(521, 522, 0.1)

The result is:

array([521. , 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8,
       521.9])

but when I want to covert it to a list, this is the result:

np.arange(521, 522, 0.1).tolist()

[521.0,
 521.1,
 521.2,
 521.3000000000001,
 521.4000000000001,
 521.5000000000001,
 521.6000000000001,
 521.7000000000002,
 521.8000000000002,
 521.9000000000002]

What part of my code is wrong? I want this list as my output: [521. , 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8, 521.9]


回答1:


You should either use np.arange(5210, 5220) / 10 or np.linspace(521, 522, 10, endpoint=False), but read the whole answer.


Part of your code is wrong, but not the way you're thinking.

Floating-point operations have rounding error. This is fundamental to floating point, and a fundamental limitation of trying to perform real-number computations on computers with finite resources. Even symbolic computation won't fix things - when an expression can't be symbolically simplified, you end up just building giant expression trees instead of actually computing anything.

The mere presence of rounding error in your output doesn't mean you've done something wrong. Also, the rounding error was already present in the arange output, just hidden by NumPy's default print settings - it wasn't introduced in the tolist call. For any even slightly nontrivial floating point calculation, you'll never eliminate all rounding error.

Even a result that looks like [521. , 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8, 521.9] would actually have rounding error, because the real number 521.1 is not actually representable in binary floating point. Most numbers that look like they're in that list aren't representable in binary floating point. The (64-bit) float 521.1 is actually 521.1000000000000227373675443232059478759765625, but most programming languages don't display the exact value by default.


The part of your code that really is wrong is the use of arange with floating-point inputs. arange is inconsistent with floating-point inputs, because based on rounding error, it may have more or less elements than expected. For example,

np.arange(1, 1.3, 0.1)

returns

array([1. , 1.1, 1.2, 1.3])

rather than

array([1. , 1.1, 1.2])

because of floating-point rounding error in the computation of the length of the result.

Also, due to some weird implementation decisions, arange with floating-point inputs has a lot more rounding error than it should even when it gets the right output size.

np.linspace is designed to avoid the problems with arange. It takes the output size as an argument directly, instead of computing it from the step, and has an explicit parameter for whether or not to include the right endpoint. This avoids rounding error in the output size computation, as long as you don't introduce it yourself by computing the output size in floating point. np.linspace also has less rounding error than arange in the computation of the output elements. It is not guaranteed to have the least rounding error possible - for example, np.linspace(0, 3, 148)[::49].tolist() shows excess rounding error - but it does a lot better than floating-point arange.

np.arange(5210, 5220) / 10 uses arange with integer arguments and divides afterward. This option has only one source of rounding error, the division by 10. This division is guaranteed by the IEEE 754 spec to be correctly rounded, in that the results will be rounded to the floating-point values closest to the ideal real-number results of the division. This option guarantees the least rounding error, beating linspace in some cases. For example, np.arange(148) / 49 beats np.linspace(0, 3, 148) in rounding error.




回答2:


arange docs warns:

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.

What you see is a combination of floating point imprecision, and the arange way of generating successive values. Floating point values are never exact.

arange recommends using linspace with fractional steps. Just be careful to choose the right number:

In [301]: np.linspace(521,522,11)                                                                                    
Out[301]: 
array([521. , 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8,
       521.9, 522. ])
In [302]: np.linspace(521,522,11).tolist()                                                                           
Out[302]: [521.0, 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8, 521.9, 522.0]

Generating integer values, and scaling them also works:

In [303]: (np.arange(5210,5220)/10)                                                                                  
Out[303]: 
array([521. , 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8,
       521.9])
In [304]: (np.arange(5210,5220)/10).tolist()                                                                         
Out[304]: [521.0, 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8, 521.9]

this can also be done with python's range:

In [305]: [i/10 for i in range(5210,5220)]                                                                           
Out[305]: [521.0, 521.1, 521.2, 521.3, 521.4, 521.5, 521.6, 521.7, 521.8, 521.9]



回答3:


Try this:

import numpy as np
list(np.around(np.arange(521, 522, 0.1),2))


来源:https://stackoverflow.com/questions/60822734/create-a-range-of-decimal-number-using-numpy-does-not-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!