Calculating the similarity of two lists

前端 未结 6 1408
野趣味
野趣味 2020-12-24 03:55

I have two lists:

eg. a = [1,8,3,9,4,9,3,8,1,2,3] and b = [1,8,1,3,9,4,9,3,8,1,2,3]

Both contain ints. There is no meaning behind the ints (eg. 1 is not \'cl

6条回答
  •  臣服心动
    2020-12-24 04:09

    I've implemented something for a similar task a long time ago. Now, I have only a blog entry for that. It was simple: you had to compute the pdf of both sequences then it would find the common area covered by the graphical representation of pdf.

    Sorry for the broken images on link, the external server that I've used back then is dead now.

    Right now, for your problem the code translates to

    def overlap(pdf1, pdf2):
        s = 0
        for k in pdf1:
            if pdf2.has_key(k):
                s += min(pdf1[k], pdf2[k])
        return s
    
    def pdf(l):
        d = {}
        s = 0.0
        for i in l:
            s += i
            if d.has_key(i):
                d[i] += 1
            else:
                d[i] = 1
        for k in d:
            d[k] /= s
        return d
    
    def solve():
        a = [1, 8, 3, 9, 4, 9, 3, 8, 1, 2, 3]
        b = [1, 8, 1, 3, 9, 4, 9, 3, 8, 1, 2, 3]
        pdf_a = pdf(a)
        pdf_b = pdf(b)
        print pdf_a
        print pdf_b
        print overlap(pdf_a, pdf_b)
        print overlap(pdf_b, pdf_a)
    
    if __name__ == '__main__':
        solve()
    

    Unfortunately, it gives an unexpected answer, only 0.212292609351

提交回复
热议问题