saving model output from Decision tree train classifier as a text file in Spark Scala platform

问题

The codes I was using to train the decision tree are as follows:

    import org.apache.spark.SparkContext 
    import org.apache.spark.mllib.tree.DecisionTree      
    import org.apache.spark.mllib.regression.LabeledPoint   
    import org.apache.spark.mllib.linalg.Vectors  
    import org.apache.spark.mllib.tree.configuration.Algo._  
    import org.apache.spark.mllib.tree.impurity.Gini   
    import org.apache.spark.mllib.util.MLUtils   
    import org.apache.spark.mllib.evaluation.MulticlassMetrics

// Load and parse the data file

    val data = sc.textFile("data/mllib/spt.csv")
    val parsedData = data.map { line =>
    val parts = line.split(',').map(_.toDouble)
    LabeledPoint(parts(0), Vectors.dense(parts.tail))
}

//Split the data

    val splits = parsedData.randomSplit(Array(0.7, 0.3))
    val (trainingData, testData) = (splits(0), splits(1))

// Train a DecisionTree model.
// Empty categoricalFeaturesInfo indicates all features are continuous.

    val numClasses = 2
    val categoricalFeaturesInfo = Map[Int, Int]()
    val impurity = "gini"
    val maxDepth = 5
    val maxBins = 32

    val model = DecisionTree.trainClassifier(trainingData, numClasses,     categoricalFeaturesInfo,
    impurity, maxDepth, maxBins)


    val labelAndPreds = trainingData.map { point =>
    val prediction = model.predict(point.features)
(point.label, prediction)
     }

//Training error       
    val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble /     trainingData.count
    println("Training Error = " + trainErr)

//Model Output                 
    println("Learned classification tree model:\n" + model)

    println("Learned classification tree model:\n" + model.toDebugString)

I want "model.toDebugString" to write or output as a text file. I found a lot of answers similar to this question, but not specific. It would be of great help if a specific help or cue can be provided. Since I am new to SCALA I am facing issues with the proper libraries to include.

I tried with the code below:

    modelFile = ~/decisionTreeModel.txt"
    f = open(modelFile,"w") 
    f.write(model.toDebugString())
    f.close()

but it was giving me this error:

<console>:1: error: ';' expected but '.' found.
       modelFile = ~/decisionTreeModel.txt"
                                      ^
<console>:1: error: unclosed string literal
       modelFile = ~/decisionTreeModel.txt"
                                           ^

Also, tried to save the model:

// Save and load model
    model.save(sc, "myModelPath")
    val sameModel = DecisionTreeModel.load(sc, "myModelPath")

The above code was also throwing errors.Thanks for any help or suggestions.

回答1:

Try this (for example on the shell):

snow:~ mkamp$ spark-shell 

...

scala> val rdd = sc.parallelize(List(1,2,3))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:15

scala> new java.io.PrintWriter("/tmp/decisionTreeModel.txt") { writeln(rdd.toDebugString); close }
res0: java.io.PrintWriter = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@65fc2639

Then on the command line (outside of Spark).

snow:~ mkamp$ cat /tmp/decisionTreeModel.txt 
(4) ParallelCollectionRDD[0] at parallelize at <console>:15 []

来源：https://stackoverflow.com/questions/33183857/saving-model-output-from-decision-tree-train-classifier-as-a-text-file-in-spark

标签

scala

apache-spark

decision-tree