sparkml setParallelism for crossvalidator

拟墨画扇 提交于 2020-02-02 14:30:10

问题


so I am trying to set a cross validation using SparkML but I am getting a run time error saying that

"value setParallelism is not a member of org.apache.spark.ml.tuning.CrossValidator" 

I am currently following the spark page tutorial. I am new to this so any help is appreciated. Bellow is my code snippet:

import org.apache.spark.ml.{Pipeline, PipelineModel}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}

// Tokenizer
val tokenizer = new Tokenizer().setInputCol("tweet").setOutputCol("words")

// HashingTF
val hash_tf = new HashingTF().setInputCol(tokenizer.getOutputCol).setOutputCol("features")

// ML models
val l_regression = new LogisticRegression().setMaxIter(100).setRegParam(0.15)

// Pipeline
val pipe = new Pipeline().setStages(Array(tokenizer, hash_tf, l_regression))

val paramGrid = new ParamGridBuilder()
.addGrid(hash_tf.numFeatures, Array(10,100,1000))
.addGrid(l_regression.regParam, Array(0.1,0.01,0.001))
.build()

val c_validator = new CrossValidator()
.setEstimator(pipe)
.setEvaluator(new BinaryClassificationEvaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(3)
.setParallelism(2)

回答1:


setParallelism is available only in Spark 2.3 or later. You must be using earlier version:

(expert-only) Parameter setters

(...)

def setParallelism(value: Int): CrossValidator.this.type

Set the maximum level of parallelism to evaluate models in parallel. Default is 1 for serial evaluation

Annotations @Since( "2.3.0" )



来源:https://stackoverflow.com/questions/49970460/sparkml-setparallelism-for-crossvalidator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!