Why does spark-ml ALS model returns NaN and negative numbers predictions?

孤者浪人 提交于 2019-12-10 16:18:46

问题


Actually I'm trying to use ALS from spark-ml with implicit ratings.

I noticed that some predictions given by my trained model are negative or NaN, why is it?


回答1:


Apache Spark provides an option to force non negative constraints on ALS.

Thus, to remove these negative values, you'll just need to set :

Python:

nonnegative=True

Scala:

setNonnegative(true)

when creating your ALS model, i.e :

>>> als = ALS(rank=10, maxIter=5, seed=0, nonnegative=True)

Non-negative matrix factorization (NMF or NNMF), also called non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have nonnegative elements [Ref. Wikipedia].

If you want to read more about NMF , I'd recommend reading the following paper :

  • Collaborative Filtering via Ensembles of Matrix Factorizations

As for NaN values, usually it's due to splitting your dataset which can lead to unseen items or users if one of them isn't present in the training set and for the matter just present in the testing set. This might also happen if you cross validated your training. For the matter, there is a couple of JIRAs that are marked resolved for 2.2 :

  • https://issues.apache.org/jira/browse/SPARK-14489.
  • https://issues.apache.org/jira/browse/SPARK-19345.

The latest will allow you set the cold start strategy to use when creating your model.



来源:https://stackoverflow.com/questions/44911349/why-does-spark-ml-als-model-returns-nan-and-negative-numbers-predictions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!