Python equivalent of MATLAB's “ismember” function

前端 未结 5 782
孤城傲影
孤城傲影 2020-11-27 07:05

After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don\'t know exactly how to conver

5条回答
  •  抹茶落季
    2020-11-27 08:06

    Try the ismember library.

    pip install ismember
    

    Simple example:

    # Import library
    from ismember import ismember
    import numpy as np
    
    # data
    A = np.array([3,4,4,3,6])
    B = np.array([2,5,2,6,3])
    
    # Lookup
    Iloc,idx = ismember(A, B)
     
    # Iloc is boolean defining existence of d in d_unique
    print(Iloc)
    # [ True False False  True  True]
    
    # indexes of d_unique that exists in d
    print(idx)
    # [4 4 3]
    
    print(B[idx])
    # [3 3 6]
    
    print(A[Iloc])
    # [3 3 6]
    
    # These vectors will match
    A[Iloc]==B[idx]
    

    Speed check:

    from ismember import ismember
    from datetime import datetime
    
    t1=[]
    t2=[]
    # Create some random vectors
    ns = np.random.randint(10,10000,1000)
    
    for n in ns:
        a_vec = np.random.randint(0,100,n)
        b_vec = np.random.randint(0,100,n)
    
        # Run stack version
        start = datetime.now()
        out1=ismember_stack(a_vec, b_vec)
        end = datetime.now()
        t1.append(end - start)
    
        # Run ismember
        start = datetime.now()
        out2=ismember(a_vec, b_vec)
        end = datetime.now()
        t2.append(end - start)
    
    
    print(np.sum(t1))
    # 0:00:07.778331
    
    print(np.sum(t2))
    # 0:00:04.609801
    
    # %%
    def ismember_stack(a, b):
        bind = {}
        for i, elt in enumerate(b):
            if elt not in bind:
                bind[elt] = i
        return [bind.get(itm, None) for itm in a]  # None can be replaced by any other "not in b" value
    

    The ismember function from pypi is almost 2x faster.

    Large vectors, eg 700000 elements:

    from ismember import ismember
    from datetime import datetime
    
    A = np.random.randint(0,100,700000)
    B = np.random.randint(0,100,700000)
    
    # Lookup
    start = datetime.now()
    Iloc,idx = ismember(A, B)
    end = datetime.now()
    
    # Print time
    print(end-start)
    # 0:00:01.194801
    

提交回复
热议问题