Python equivalent of MATLAB's “ismember” function

前端未结

关注

 5  782

孤城傲影 2020-11-27 07:05

After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don\'t know exactly how to conver

5条回答

抹茶落季 (楼主)

2020-11-27 08:06

Try the ismember library.

pip install ismember

Simple example:

# Import library
from ismember import ismember
import numpy as np

# data
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])

# Lookup
Iloc,idx = ismember(A, B)
 
# Iloc is boolean defining existence of d in d_unique
print(Iloc)
# [ True False False  True  True]

# indexes of d_unique that exists in d
print(idx)
# [4 4 3]

print(B[idx])
# [3 3 6]

print(A[Iloc])
# [3 3 6]

# These vectors will match
A[Iloc]==B[idx]

Speed check:

from ismember import ismember
from datetime import datetime

t1=[]
t2=[]
# Create some random vectors
ns = np.random.randint(10,10000,1000)

for n in ns:
    a_vec = np.random.randint(0,100,n)
    b_vec = np.random.randint(0,100,n)

    # Run stack version
    start = datetime.now()
    out1=ismember_stack(a_vec, b_vec)
    end = datetime.now()
    t1.append(end - start)

    # Run ismember
    start = datetime.now()
    out2=ismember(a_vec, b_vec)
    end = datetime.now()
    t2.append(end - start)


print(np.sum(t1))
# 0:00:07.778331

print(np.sum(t2))
# 0:00:04.609801

# %%
def ismember_stack(a, b):
    bind = {}
    for i, elt in enumerate(b):
        if elt not in bind:
            bind[elt] = i
    return [bind.get(itm, None) for itm in a]  # None can be replaced by any other "not in b" value

The ismember function from pypi is almost 2x faster.

Large vectors, eg 700000 elements:

from ismember import ismember
from datetime import datetime

A = np.random.randint(0,100,700000)
B = np.random.randint(0,100,700000)

# Lookup
start = datetime.now()
Iloc,idx = ismember(A, B)
end = datetime.now()

# Print time
print(end-start)
# 0:00:01.194801

0 讨论(0)

查看其它5个回答