collaborative-filtering

Why does ALS.trainImplicit give better predictions for explicit ratings?

阅读更多关于 Why does ALS.trainImplicit give better predictions for explicit ratings?

问题 Edit: I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing. I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train ), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the

Get wrong recommendation with ALS.recommendation

阅读更多关于 Get wrong recommendation with ALS.recommendation

问题 I write a spark program for making recommendations. Then I used ALS.recommendation library. And I made a small test with the following dataset called trainData: (u1, m1, 1) (u1, m4, 1) (u2, m2, 1) (u2, m3, 1) (u3, m1, 1) (u3, m3, 1) (u3, m4, 1) (u4, m3, 1) (u4, m4, 1) (u5, m2, 1) (u5, m4, 1) The first column contains the user, the second contains the items rated by the users and the third contains the ratings. In my code written in scala I trained the model using: myModel = ALS.trainImplicit

How can I build a CoordinateMatrix in Spark using a DataFrame?

阅读更多关于 How can I build a CoordinateMatrix in Spark using a DataFrame?

问题 I am trying to use the Spark implementation of the ALS algorithm for recommendation systems, so I built the DataFrame depicted below, as training data: |--------------|--------------|--------------| | userId | itemId | rating | |--------------|--------------|--------------| Now, I would like to create a sparse matrix, to represent the interactions between every user and every item. The matrix will be sparse because if there is no interaction between a user and an item, the corresponding value

Distribution among users for collaborative voting algorithm

阅读更多关于 Distribution among users for collaborative voting algorithm

问题 Users of my application (it's a game actually) answer questions to get points. Questions are supplied by other users. Due to volume, I cannot check everything myself, so I decided to crowd-source the filtering process to the users (players). The rules are simple: each user is shown a question to rate as good/bad/unsure when question is rated 5 times as "bad" it is removed from the pool when question is rated 5 times as "good" it is removed from the poll and flagged to be played by other

Python Non negative Matrix Factorization that handles both zeros and missing data?

阅读更多关于 Python Non negative Matrix Factorization that handles both zeros and missing data?

问题 I look for a NMF implementation that has a python interface, and handles both missing data and zeros. I don't want to impute my missing values before starting the factorization, I want them to be ignored in the minimized function. It seems that neither scikit-learn, nor nimfa, nor graphlab, nor mahout propose such an option. Thanks! 回答1: Using this Matlab to python code conversion sheet I was able to rewrite NMF from Matlab toolbox library. I had to decompose a 40k X 1k matrix with sparsity

How to set a value's for calculating Eucludeian distance and correlation

阅读更多关于 How to set a value's for calculating Eucludeian distance and correlation

问题 Here is my word vector : google test stackoverflow yahoo I have assigned a value for these words as follows : google : 1 test : 2 stackoverflow : 3 yahoo : 4 Here are some sample users and their words : user1 google, test , stackoverflow user2 test , google user3 test , yahoo user4 stackoverflow , yahoo user5 stackoverflow , google user6 To cater for users which do not have value contained in the word vector I assign '0' Based on this, this corresponds to : user1 1, 2 , 3 user2 2 , 1 , 0

MLlib MatrixFactorizationModel recommendProducts(user, num) failing on some users

阅读更多关于 MLlib MatrixFactorizationModel recommendProducts(user, num) failing on some users

问题 I trained a MatrixFactorizationModel model using ALS.train() and now using model.recommendProducts(user, num) to get the top recommended products, but the code fails on some users with the following error: user_products = model.call("recommendProducts", user, prodNum) File "/usr/lib/spark/python/pyspark/mllib/common.py", line 136, in call return callJavaFunc(self._sc, getattr(self._java_model, name), *a) File "/usr/lib/spark/python/pyspark/mllib/common.py", line 113, in callJavaFunc return

Apache Spark ALS - how to perform Live Recommendations / fold-in anonym user

阅读更多关于 Apache Spark ALS - how to perform Live Recommendations / fold-in anonym user

问题 I am using Apache Spark (Pyspark API for Python) ALS MLLIB to develop a service that performs live recommendations for anonym users (users not in the training set) in my site. In my usecase I train the model on the User ratings in this way: from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating ratings = df.map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2]))) rank = 10 numIterations = 10 model = ALS.trainImplicit(ratings, rank, numIterations) Now, each time an

Open Source collaborative filtering frameworks

阅读更多关于 Open Source collaborative filtering frameworks

问题 I was wondering if there exists any open source frameworks that will help me include the following type of functionality to my website: 1) If I am viewing a particular product, I would like to see what other products may be interesting to me. This information may be deduced by calculating for example what other people in my region (or any other characteristic of my profile) bought in addition to the product that I am viewing. Kind of like what Amazon.com does. 2) Deduce relationships between

Building a Collaborative filtering / Recommendation System

阅读更多关于 Building a Collaborative filtering / Recommendation System

问题 I'm in the process of designing a website that is built around the concept of recommending various items to users based on their tastes. (i.e. items they've rated, items added to their favorites list, etc.) Some examples of this are Amazon, Movielens, and Netflix. Now, my problem is, I'm not sure where to start in regards to the mathematical part of this system. I'm willing to learn the math that's required, it's just I don't know what type of math is required. I've looked at a few of the