What is wrong with this python function from “Programming Collective Intelligence”?

后端 未结 4 1037
梦如初夏
梦如初夏 2021-01-06 07:31

This is the function in question. It calculates the Pearson correlation coefficient for p1 and p2, which is supposed to be a number between -1 and 1.

When I use this

4条回答
  •  佛祖请我去吃肉
    2021-01-06 07:38

    Well, I wasn't exactly able to find what's wrong with the logic in your function, so I just reimplemented it using the definition of Pearson coefficient:

    from math import sqrt
    
    def sim_pearson(p1,p2):
        keys = set(p1) | set(p2)
        n = len(keys)
    
        a1 = sum(p1[it] for it in keys) / n
        a2 = sum(p2[it] for it in keys) / n
    
    #    print(a1, a2)
    
        sum1Sq = sum((p1[it] - a1) ** 2 for it in keys)
        sum2Sq = sum((p2[it] - a2) ** 2 for it in keys) 
    
        num = sum((p1[it] - a1) * (p2[it] - a2) for it in keys)
        den = sqrt(sum1Sq * sum2Sq)
    
    #    print(sum1Sq, sum2Sq, num, den)
        return num / den
    
    critics = {
        'user1':{
            'item1': 3,
            'item2': 5,
            'item3': 5,
            },
    
        'user2':{
            'item1': 4,
            'item2': 5,
            'item3': 5,
            }
    }
    
    assert 0.999 < sim_pearson(critics['user1'], critics['user1']) < 1.0001
    
    print('Your example:', sim_pearson(critics['user1'], critics['user2']))
    print('Another example:', sim_pearson({1: 1, 2: 2, 3: 3}, {1: 4, 2: 0, 3: 1}))
    

    Note that in your example the Pearson coefficient is just 1.0 since vectors (-4/3, 2/3, 2/3) and (-2/3, 1/3, 1/3) are parallel.

提交回复
热议问题