问题
The codes I was using to train the decision tree are as follows:
import org.apache.spark.SparkContext
import org.apache.spark.mllib.tree.DecisionTree
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.tree.configuration.Algo._
import org.apache.spark.mllib.tree.impurity.Gini
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.evaluation.MulticlassMetrics
// Load and parse the data file
val data = sc.textFile("data/mllib/spt.csv")
val parsedData = data.map { line =>
val parts = line.split(',').map(_.toDouble)
LabeledPoint(parts(0), Vectors.dense(parts.tail))
}
//Split the data
val splits = parsedData.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
// Train a DecisionTree model.
// Empty categoricalFeaturesInfo indicates all features are continuous.
val numClasses = 2
val categoricalFeaturesInfo = Map[Int, Int]()
val impurity = "gini"
val maxDepth = 5
val maxBins = 32
val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo,
impurity, maxDepth, maxBins)
val labelAndPreds = trainingData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
//Training error
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / trainingData.count
println("Training Error = " + trainErr)
//Model Output
println("Learned classification tree model:\n" + model)
println("Learned classification tree model:\n" + model.toDebugString)
I want "model.toDebugString" to write or output as a text file. I found a lot of answers similar to this question, but not specific. It would be of great help if a specific help or cue can be provided. Since I am new to SCALA I am facing issues with the proper libraries to include.
I tried with the code below:
modelFile = ~/decisionTreeModel.txt"
f = open(modelFile,"w")
f.write(model.toDebugString())
f.close()
but it was giving me this error:
<console>:1: error: ';' expected but '.' found.
modelFile = ~/decisionTreeModel.txt"
^
<console>:1: error: unclosed string literal
modelFile = ~/decisionTreeModel.txt"
^
Also, tried to save the model:
// Save and load model
model.save(sc, "myModelPath")
val sameModel = DecisionTreeModel.load(sc, "myModelPath")
The above code was also throwing errors.Thanks for any help or suggestions.
回答1:
Try this (for example on the shell):
snow:~ mkamp$ spark-shell
...
scala> val rdd = sc.parallelize(List(1,2,3))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:15
scala> new java.io.PrintWriter("/tmp/decisionTreeModel.txt") { writeln(rdd.toDebugString); close }
res0: java.io.PrintWriter = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@65fc2639
Then on the command line (outside of Spark).
snow:~ mkamp$ cat /tmp/decisionTreeModel.txt
(4) ParallelCollectionRDD[0] at parallelize at <console>:15 []
来源:https://stackoverflow.com/questions/33183857/saving-model-output-from-decision-tree-train-classifier-as-a-text-file-in-spark