numpy around/rint slow compared to astype(int)

后端未结

关注

 2  1563

甜味超标 2021-01-12 12:01

So if I have something like x=np.random.rand(60000)*400-200. iPython\'s %timeit says:

x.astype(int) takes 0.14ms

2条回答

青春惊慌失措 (楼主)

2021-01-12 12:26
np.around(x).astype(int) and x.astype(int) don't produce the same values. The former rounds even (it's the same as ((x*x>=0+0.5) + (x*x<0-0.5)).astype(int)) whereas the latter rounds towards zero. However,
```
y = np.trunc(x).astype(int)
z = x.astype(int)
```
shows y==z but calculating y is much slower. So it's the np.truncand np.around functions which are slow.
```
In [165]: x.dtype
Out[165]: dtype('float64')
In [168]: y.dtype
Out[168]: dtype('int64')
```
So np.trunc(x) rounds towards zero from double to double. Then astype(int) has to convert double to int64.

Internally I don't know what python or numpy are doing but I know how I would do this in C. Let's discuss some hardware. With SSE4.1 it's possible to do round, floor, ceil, and trunc from double to double using:
```
_mm_round_pd(a, 0); //round: round even
_mm_round_pd(a, 1); //floor: round towards minus infinity
_mm_round_pd(a, 2); //ceil:  round towards positive infinity
_mm_round_pd(a, 3); //trunc: round towards zero
```
but numpy needs to support systems without SSE4.1 as well so it would have to build without SSE4.1 as well as with SSE4.1 and then use a dispatcher.

But to do this from double directly to int64 using SSE/AVX is not efficient until AVX512. However, it is possible to round double to int32 efficiently using only SSE2:
```
_mm_cvtpd_epi32(a);  //round double to int32 then expand to int64
_mm_cvttpd_epi32(a); //trunc double to int32 then expand to int64
```
These converts two doubles to two int64.

In your case this would work fine since the range is certainly within int32. But unless python knows the range fits in int32 it can't assume this so it would have to round or trunc to int64 which is slow. Also, once again numpy would have to build to support SSE2 to do this anyway.

But maybe you could have used a single floating point array to begin with. In that case you could have done:
```
_mm_cvtps_epi32(a); //round single to int32
_mm_cvttps_epi32(a) //trunc single to int32
```
These convert four singles to four int32.

So to answer your question SSE2 can round or truncated from double to int32 efficiently. AVX512 will be able to round or truncated from double to int64 efficiently as well using _mm512_cvtpd_epi64(a) or _mm512_cvttpd_epi64(a). SSE4.1 can round/trunc/floor/ceil from float to float or double to double efficiently.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...