Calculating the similarity of two lists

前端 未结 6 1409
野趣味
野趣味 2020-12-24 03:55

I have two lists:

eg. a = [1,8,3,9,4,9,3,8,1,2,3] and b = [1,8,1,3,9,4,9,3,8,1,2,3]

Both contain ints. There is no meaning behind the ints (eg. 1 is not \'cl

6条回答
  •  悲哀的现实
    2020-12-24 04:06

    One way to tackle this is to utilize histogram. As an example (demonstration with numpy):

    In []: a= array([1,8,3,9,4,9,3,8,1,2,3])
    In []: b= array([1,8,1,3,9,4,9,3,8,1,2,3])
    
    In []: a_c, _= histogram(a, arange(9)+ 1)
    In []: a_c
    Out[]: array([2, 1, 3, 1, 0, 0, 0, 4])
    
    In []: b_c, _= histogram(b, arange(9)+ 1)
    In []: b_c
    Out[]: array([3, 1, 3, 1, 0, 0, 0, 4])
    
    In []: (a_c- b_c).sum()
    Out[]: -1
    

    There exist now plethora of ways to harness a_c and b_c.

    Where the (seemingly) simplest similarity measure is:

    In []: 1- abs(-1/ 9.)
    Out[]: 0.8888888888888888
    

    Followed by:

    In []: norm(a_c)/ norm(b_c)
    Out[]: 0.92796072713833688
    

    and:

    In []: a_n= (a_c/ norm(a_c))[:, None]
    In []: 1- norm(b_c- dot(dot(a_n, a_n.T), b_c))/ norm(b_c)
    Out[]: 0.84445724579043624
    

    Thus, you need to be much more specific to find out most suitable similarity measure suitable for your purposes.

提交回复
热议问题