Condensed matrix function to find pairs

后端 未结 7 1147
忘了有多久
忘了有多久 2020-12-24 05:18

For a set of observations:

[a1,a2,a3,a4,a5]

their pairwise distances

d=[[0,a12,a13,a14,a15]
   [a21,0,a23,a24,a25]
   [a31,         


        
7条回答
  •  南方客
    南方客 (楼主)
    2020-12-24 05:56

    To complete the list of answers to this question: A fast, vectorized version of fgreggs answer (as suggested by David Marx) could look like this:

    def vec_row_col(d,i):                                                                
        i = np.array(i)                                                                 
        b = 1 - 2 * d                                                                   
        x = np.floor((-b - np.sqrt(b**2 - 8*i))/2).astype(int)                                      
        y = (i + x*(b + x + 2)/2 + 1).astype(int)                                                    
        if i.shape:                                                                     
            return zip(x,y)                                                             
        else:                                                                           
            return (x,y) 
    

    I needed to do these calculations for huge arrays, and the speedup as compared to the un-vectorized version (https://stackoverflow.com/a/14839010/3631440) is (as usual) quite impressive (using IPython %timeit):

    import numpy as np
    from scipy.spatial import distance
    
    test = np.random.rand(1000,1000)
    condense = distance.pdist(test)
    sample = np.random.randint(0,len(condense), 1000)
    
    %timeit res = vec_row_col(1000, sample)
    10000 loops, best of 3: 156 µs per loop
    
    res = []
    %timeit for i in sample: res.append(row_col_from_condensed_index(1000, i))
    100 loops, best of 3: 5.87 ms per loop
    

    That's about 37 times faster in this example!

提交回复
热议问题