Python - How to generate the Pairwise Hamming Distance Matrix

后端 未结 2 656
既然无缘
既然无缘 2021-01-23 06:19

beginner with Python here. So I\'m having trouble trying to calculate the resulting binary pairwise hammington distance matrix between the rows of an input matrix using only th

2条回答
  •  忘掉有多难
    2021-01-23 07:00

    For reasons I do not understand this

    (2 * np.inner(a-0.5, 0.5-a) + a.shape[1] / 2)
    

    appears to be much faster than @Psidom's for larger arrays:

    a = np.random.randint(0,2,(100,1000))
    timeit(lambda: (a[:, None, :] != a).sum(2), number=100)
    # 2.297890231013298
    timeit(lambda: (2 * np.inner(a-0.5, 0.5-a) + a.shape[1] / 2), number=100)
    # 0.10616962902713567
    

    Psidom's is a bit faster for the very small example:

    a
    # array([[1, 0, 0, 1, 1, 0],
    #        [1, 0, 0, 0, 0, 0],
    #        [1, 1, 1, 1, 0, 0]])
    
    timeit(lambda: (a[:, None, :] != a).sum(2), number=100)
    # 0.0004370050155557692
    timeit(lambda: (2 * np.inner(a-0.5, 0.5-a) + a.shape[1] / 2), number=100)
    # 0.00068191799800843
    

    Update

    Part of the reason appears to be floats being faster than other dtypes:

    timeit(lambda: (0.5 * np.inner(2*a-1, 1-2*a) + a.shape[1] / 2), number=100)
    # 0.7315902590053156
    timeit(lambda: (0.5 * np.inner(2.0*a-1, 1-2.0*a) + a.shape[1] / 2), number=100)
    # 0.12021801102673635
    

提交回复
热议问题