Understanding the Pearson Correlation Coefficient

给你一囗甜甜゛ 提交于 2019-12-22 17:15:20

问题


As part of the calculations to generate a Pearson Correlation Coefficient, the following computation is performed:

In the second formula: p_a,i is the predicted rating user a would give item i, n is the number of similar users being compared to, and ru,i is the rating of item i by user u.

What value will be used if user u has not rated this item? Did I misunderstand anything here?


回答1:


According to the link, earlier calculations in step 1 of the algorithm are over a set of items, indexed 1 to m, whe m is the total number of items in common.

Step 3 of the algorithm specifies: "To find a rating prediction for a particular user for a particular item, first select a number of users with the highest, weighted similarity scores with respect to the current user that have rated on the item in question."

These calculations are performed only on the intersection of different users set of rated items. There will be no calculations performed when a user has not rated an item.




回答2:


It only makes sense to calculate results if both users have rated a movie. Linear regression can be visualised as a method of finding a straight line through a two-dimensional graph where one variable is plotted on the X axis and another one - on Y axis. Each combination of ratings is represented as a point on an euclidean plane [u1_rating, u2_rating]. Since you can not plot points which only have one dimension to them, you'll have to discard those cases.



来源:https://stackoverflow.com/questions/6268956/understanding-the-pearson-correlation-coefficient

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!