How to do Cross validation in sparkr

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-08 10:47:02

问题


I am working with movie lens dataset, I have a matrix(m X n) of user id as row and movie id as columns and I have done dimension reduction technique and matrix factorization to reduce my sparse matrix (m X k, where k < n ). I want to evaluate the performance using the k-nearest neighbor algorithm (not library , my own code) . I am using sparkR 1.6.2. I don't know how to split my dataset into training data and test data in sparkR. I have tried native R function (sample, subset,CARET) but it is not compatible with spark data frame. kindly give some suggestion for performing cross-validation and training the classifier using my own function written in sparkR


回答1:


The sparklyr (https://spark.rstudio.com/) package provides simple functionality for partitioning data. For example, if we have a data frame called df in Spark we could create a copy of it with compute() then partition it with sdf_partition().

df_part <- df %>%
  compute("df_part") %>%
  sdf_partition(test = 0.2, train = 0.8, seed = 2017)

df_part Would then be a connection to a Spark DataFrame. We could use collect() to copy the Spark DataFrame into an R dataframe.



来源:https://stackoverflow.com/questions/40373510/how-to-do-cross-validation-in-sparkr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!