Is there a better way to determine cross-mapping indicies for numpy arrays

前端 未结 1 1152
Happy的楠姐
Happy的楠姐 2021-01-23 07:19

I need the cross-mapped indicies for numpy union and intersection operations. The code I have below works fine, but I would like to vectorize it before I apply it to large data

1条回答
  •  既然无缘
    2021-01-23 07:41

    For cases like these, you might want to convert the strings into numerals, as working with them is far more efficient. Also, given the fact that the outputs are numeric arrays, it makes more sense to have them as numeric IDs upfront. Now, for this conversion to numeric IDs, I have seen people using lambda among other approaches, but I would go with np.unique, which is quite efficient for cases like these. Here's the implementation starting with the numeric ID conversion -

    # ------------------------ Setup work -------------------------------
    _,idx1 = np.unique(np.append(A,B),return_inverse=True)
    A_ID = idx1[:A.size]
    B_ID = idx1[A.size:]
    
    # ------------------------ Union work -------------------------------
    # Get length of zc, which would be the max of ID+1.
    lenC = idx1.max()+1
    
    # Initialize output array zc and fill with NaNs.
    zc1 = np.empty((lenC,3,))
    zc1[:]=np.nan
    
    # Fill first column with consecutive numbers starting with 0
    zc1[:,0] = range(0,lenC)
    
    # Most important part of the code :
    # Set the cols-1,2 at places specified by IDs from A and B respectively
    # with values from 0 to the extent of the respective IDs
    zc1[A_ID,1] = np.arange(A_ID.size)
    zc1[B_ID,2] = np.arange(B_ID.size)
    
    # ------------------------ Intersection work -------------------------------
    # Get intersecting indices between A and B
    intersect_ID = np.argwhere(A_ID[:,None] == B_ID)
    
    # Initialize output zd based on the number of interesects
    lenD = intersect_ID.shape[0]
    zd1 = np.empty((lenD,3,))
    zd1[:] = np.nan
    
    # Fill first column with consecutive numbers starting with 0
    zd1[:,0] = range(0,lenD)
    zd1[:,1:] = intersect_ID
    

    0 讨论(0)
提交回复
热议问题