I\'m trying to tune the parameters of an ALS matrix factorization model that uses implicit data. For this, I\'m trying to use pyspark.ml.tuning.CrossValidator to run through
Ignoring technical issues, strictly speaking neither method is correct given the input generated by ALS with implicit feedback.
RegressionEvaluator because, as you already know, prediction can be interpreted as a confidence value and is represented as a floating point number in range [0, 1] and label column is just an unbound integer. These values are clearly not comparable.BinaryClassificationEvaluator because even if the prediction can be interpreted as probability label doesn't represent binary decision. Moreover prediction column has invalid type and couldn't be used directly with BinaryClassificationEvaluatorYou can try to convert one of the columns so input fit the requirements but this is is not really a justified approach from a theoretical perspective and introduces additional parameters which are hard to tune.
map label column to [0, 1] range and use RMSE.
convert label column to binary indicator with fixed threshold and extend ALS / ALSModel to return expected column type. Assuming threshold value is 1 it could be something like this
from pyspark.ml.recommendation import *
from pyspark.sql.functions import udf, col
from pyspark.mllib.linalg import DenseVector, VectorUDT
class BinaryALS(ALS):
def fit(self, df):
assert self.getImplicitPrefs()
model = super(BinaryALS, self).fit(df)
return ALSBinaryModel(model._java_obj)
class ALSBinaryModel(ALSModel):
def transform(self, df):
transformed = super(ALSBinaryModel, self).transform(df)
as_vector = udf(lambda x: DenseVector([1 - x, x]), VectorUDT())
return transformed.withColumn(
"rawPrediction", as_vector(col("prediction")))
# Add binary label column
with_binary = dfCounts.withColumn(
"label_binary", (col("rating") > 0).cast("double"))
als_binary_model = BinaryALS(implicitPrefs=True).fit(with_binary)
evaluatorB = BinaryClassificationEvaluator(
metricName="areaUnderROC", labelCol="label_binary")
evaluatorB.evaluate(als_binary_model.transform(with_binary))
## 1.0
Generally speaking, material about evaluating recommender systems with implicit feedbacks is kind of missing in textbooks, I suggest you take a read on eliasah's answer about evaluating these kind of recommenders.