Best way to count Greater Than in numpy 2d array

爷,独闯天下 提交于 2021-02-08 07:01:22

问题


results is 2d numpy array with size 300000

for i in range(np.size(results,0)):  
     if results[i][0]>=0.7:  
        count+=1

it takes me 0.7 second in this python code,but I run this in C++ code,it takes less than 0.07 second.
So how to make this python code as fast as possible?


回答1:


When doing numerical computation for speed, especially in Python, you never want to use for loops if possible. Numpy is optimized for "vectorized" computation, so you want to pass off the work you'd typically do in for loops to special numpy indexing and functions like where.

I did a quick test on a 300,000 x 600 array of random values from 0 to 1 and found the following.

Your code, non-vectorized with one for loop:
226 ms per run

%%timeit
count = 0
for i in range(np.size(n,0)):  
     if results[i][0]>=0.7:  
        count+=1

emilaz Solution:
8.36 ms per run

%%timeit
first_col = results[:,0]
x = len(first_col[first_col>.7])

Ethan's Solution:
7.84 ms per run

%%timeit
np.bincount(results[:,0]>=.7)[1]

Best I came up with
6.92 ms per run

%%timeit
len(np.where(results[:,0] > 0.7)[0])

All 4 methods yielded the same answer, which for my data was 90,134. Hope this helps!




回答2:


Try

first_col=results[:,0]
res =len(first_col[first_col>.7])

Depending on the shape of your matrix, this can be 2-10 times faster than your approach.




回答3:


You could give the following a try:

np.bincount(results[:,0]>=.7)[1]

Not sure it’s faster, but should produce the correct answer



来源:https://stackoverflow.com/questions/55698337/best-way-to-count-greater-than-in-numpy-2d-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!