RDD transformations and actions can only be invoked by the driver

时间秒杀一切 提交于 2019-11-30 22:46:52
zero323

MatrixFactorizationModel is a distributed model so you cannot simply call it from an action or a transformation. The closest thing to what you do here is something like this:

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.recommendation.{MatrixFactorizationModel, Rating}

def computeRatio(model: MatrixFactorizationModel, testUsers: RDD[Rating]) = {
  val testData = testUsers.map(r => (r.user, r.product)).groupByKey
  val n = testData.count

  val recommendations = model
     .recommendProductsForUsers(20)
     .mapValues(_.map(r => r.product))

  val hits = testData
    .join(recommendations)
    .values
    .map{case (xs, ys) => xs.toSet.intersect(ys.toSet).size}
    .sum

  hits / n
}

Notes:

  • distinct is an expensive operation and completely obsoletely here since you can obtain the same information from a grouped data
  • instead of groupBy followed by projection (map), project first and group later. There is no reason to transfer full ratings if you want only a product ids.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!