Tuning parameters for implicit pyspark.ml ALS matrix factorization model through pyspark.ml CrossValidator

前端未结

关注

 3  1590

臣服心动 2020-11-30 01:37

I\'m trying to tune the parameters of an ALS matrix factorization model that uses implicit data. For this, I\'m trying to use pyspark.ml.tuning.CrossValidator to run through

3条回答

广开言路 (楼主)

2020-11-30 02:07
Ignoring technical issues, strictly speaking neither method is correct given the input generated by ALS with implicit feedback.
- you cannot use RegressionEvaluator because, as you already know, prediction can be interpreted as a confidence value and is represented as a floating point number in range [0, 1] and label column is just an unbound integer. These values are clearly not comparable.
- you cannot use BinaryClassificationEvaluator because even if the prediction can be interpreted as probability label doesn't represent binary decision. Moreover prediction column has invalid type and couldn't be used directly with BinaryClassificationEvaluator
You can try to convert one of the columns so input fit the requirements but this is is not really a justified approach from a theoretical perspective and introduces additional parameters which are hard to tune.
- map label column to [0, 1] range and use RMSE.
- convert label column to binary indicator with fixed threshold and extend ALS / ALSModel to return expected column type. Assuming threshold value is 1 it could be something like this
```
from pyspark.ml.recommendation import *
from pyspark.sql.functions import udf, col
from pyspark.mllib.linalg import DenseVector, VectorUDT

class BinaryALS(ALS):
    def fit(self, df):
        assert self.getImplicitPrefs()
        model = super(BinaryALS, self).fit(df)
        return ALSBinaryModel(model._java_obj)

class ALSBinaryModel(ALSModel):
    def transform(self, df):
        transformed = super(ALSBinaryModel, self).transform(df)
        as_vector = udf(lambda x: DenseVector([1 - x, x]), VectorUDT())
        return transformed.withColumn(
            "rawPrediction", as_vector(col("prediction")))

# Add binary label column
with_binary = dfCounts.withColumn(
    "label_binary", (col("rating") > 0).cast("double"))

als_binary_model = BinaryALS(implicitPrefs=True).fit(with_binary)

evaluatorB = BinaryClassificationEvaluator(
    metricName="areaUnderROC", labelCol="label_binary")

evaluatorB.evaluate(als_binary_model.transform(with_binary))
## 1.0
```
Generally speaking, material about evaluating recommender systems with implicit feedbacks is kind of missing in textbooks, I suggest you take a read on eliasah's answer about evaluating these kind of recommenders.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...