The prediction time of spark matrix factorization

萝らか妹 提交于 2019-12-12 04:59:26

问题


I have simple Python app. take ratings.csv which has user_id, product_id, rating which contains 4 M record then I use Spark AlS and save the model, then I load it to matrixFactorization.

my problem with method predicts which takes more than one second to predict the rating between user and product. my server is 32 G and 8 cores. any suggestion how I can enhance the prediction time to be less than 100milisecond. and what the relationship between a number of records in the data set and the prediction time.

Here is what I am doing :

spark_config = SparkConf().setAll([('spark.executor.memory', '32g'), ('spark.cores.max', '8')]) 
als_recommender.sc = SparkContext(conf=spark_config) #training_data is array of tulips of 4 M record 
training_data = als_recommender.sc.parallelize(training_data) als_recommender.model = ALS.trainImplicit(training_data, 10, 10, nonnegative=True) 
als_recommender.model.save(als_recommender.sc, "....Ameer/als_model") 
als_recommender_model = MatrixFactorizationModel.load(als_recommender.sc, "....Ameer/als_model") 
als_recommender_model.predict(1,2913)

回答1:


Basically, you do not want to have to load the full model everytime you need to answer.

Depending on the model update frequency and in the number of prediction queries, I would either :

  • keep the model in memory and being able to answer to queries from there. For answer < 100ms, you will need to measure each step. Livy can be a good catch but I am not sure on its overhead.
  • output the top X predictions for each user and store them in DB. Redis is a good candidate as its fast, values can be a list


来源:https://stackoverflow.com/questions/45818401/the-prediction-time-of-spark-matrix-factorization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!