matrix-factorization

Evaluating the LightFM Recommendation Model

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-18 17:30:23
问题 I've been playing around with lightfm for quite some time and found it really useful to generate recommendations. However, there are two main questions that I would like to know. to evaluate the LightFM model in case where the rank of the recommendations matter, should I rely more on precision@k or other provided evaluation metrics such as AUC score ? in what cases should I focus on improving my precision@k compared to other metrics? or maybe are they highly correlated? which means if I

Sympy: Solving Matrices in a finite field

家住魔仙堡 提交于 2019-12-18 16:25:08
问题 For my project, I need to solve for a matrix X given matrices Y and K. (XY=K) The elements of each matrix must be integers modulo a random 256-bit prime. My first attempt at solving this problem used SymPy's mod_inv(n) function. The problem with this is that I'm running out of memory with matrices of around size 30. My next thought was to perform matrix factorization, as that might be less heavy on memory. However, SymPy seems to contain no solver that can find matrices modulo a number. Any

Vowpal Wabbit: Cannot retrieve latent factors with gd_mf_weights from a trained --rank model

≡放荡痞女 提交于 2019-12-11 15:23:21
问题 I trained a rank 40 model on the movielens data, but cannot retrieve the weights from the trained model with gd_mf_weights. I'm following the syntax from the VW matrix factorization example but it is giving me errors. Please advise. Model training call: vw --rank 40 -q ui --l2 0.1 --learning_rate 0.015 --decay_learning_rate 0.97 --power_t 0 --passes 50 --cache_file movielens.cache -f movielens.reg -d train.vw Weights generating call: library/gd_mf_weights -I train.vw -O '/data/home/mlteam

Debugging large task sizes in Spark MLlib

亡梦爱人 提交于 2019-12-10 09:56:23
问题 In Apache Spark (Scala shell), I am attempting: val model = ALS.trainImplicit(training, rank, numIter) where training is a million-row file partitioned into 100 partitions, rank=20, and numIter=20. I get a string of messages of the form: WARN scheduler.TaskSetManager: Stage 2175 contains a task of very large size (101 KB). The maximum recommended task size is 100 KB. How do I go about debugging this? I've heard broadcast variables are useful in reducing task size, but in this case there's no

Apache Spark ALS collaborative filtering results. They don't make sense

寵の児 提交于 2019-12-09 06:28:20
问题 I wanted to try out Spark for collaborative filtering using MLlib as explained in this tutorial: https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html The algorithm is based on the paper "Collaborative Filtering for Implicit Feedback Datasets", doing matrix factorization. Everything is up and running using the 10 million Movielens data set. The data set it split into 80% training 10% test and 10% validation. RMSE Baseline: 1.060505464225402 RMSE (train) = 0

Correct use of pivot in Cholesky decomposition of positive semi-definite matrix

Deadly 提交于 2019-12-05 06:59:26
I don't understand how to use the chol function in R to factor a positive semi-definite matrix. (Or I do, and there's a bug.) The documentation states: If pivot = TRUE, then the Choleski decomposition of a positive semi-definite x can be computed. The rank of x is returned as attr(Q, "rank"), subject to numerical errors. The pivot is returned as attr(Q, "pivot"). It is no longer the case that t(Q) %*% Q equals x. However, setting pivot <- attr(Q, "pivot") and oo <- order(pivot), it is true that t(Q[, oo]) %*% Q[, oo] equals x ... The following example seems to belie this description. > x <-

Is there good library to do nonnegative matrix factorization (NMF) fast?

爷,独闯天下 提交于 2019-12-03 10:12:48
问题 I have a sparse matrix whose shape is 570000*3000 . I tried nima to do NMF (using the default nmf method, and set max_iter to 65). However, I found nimfa very slow. Have anyone used a faster library to do NMF? 回答1: I have used libNMF before. It's written in C and is very fast. There is a paper documenting the algorithm and code. The paper also lists several alternative packages for NMF (in bunch of different languages (which I have copied here for future reference). The Mathworks [3, 33]

Apache Spark ALS collaborative filtering results. They don't make sense

∥☆過路亽.° 提交于 2019-12-03 07:57:33
I wanted to try out Spark for collaborative filtering using MLlib as explained in this tutorial: https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html The algorithm is based on the paper "Collaborative Filtering for Implicit Feedback Datasets", doing matrix factorization. Everything is up and running using the 10 million Movielens data set. The data set it split into 80% training 10% test and 10% validation. RMSE Baseline: 1.060505464225402 RMSE (train) = 0.7697248827452756 RMSE (validation) = 0.8057135933012889 for the model trained with rank = 24, lambda = 0.1,

Is there good library to do nonnegative matrix factorization (NMF) fast?

会有一股神秘感。 提交于 2019-12-03 00:43:58
I have a sparse matrix whose shape is 570000*3000 . I tried nima to do NMF (using the default nmf method, and set max_iter to 65). However, I found nimfa very slow. Have anyone used a faster library to do NMF? tskuzzy I have used libNMF before. It's written in C and is very fast. There is a paper documenting the algorithm and code. The paper also lists several alternative packages for NMF (in bunch of different languages (which I have copied here for future reference). The Mathworks [3, 33] Matlab http://www.mathworks.com/access/helpdesk/help/toolbox/stats/nnmf . Cemgil [5] Matlab http://www

Evaluating the LightFM Recommendation Model

纵饮孤独 提交于 2019-11-30 22:36:15
I've been playing around with lightfm for quite some time and found it really useful to generate recommendations. However, there are two main questions that I would like to know. to evaluate the LightFM model in case where the rank of the recommendations matter, should I rely more on precision@k or other provided evaluation metrics such as AUC score ? in what cases should I focus on improving my precision@k compared to other metrics? or maybe are they highly correlated? which means if I manage to improve my precision@k score, the other metrics would follow, am I correct? how would you