I have to use this code:
val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth);
I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures
as numerical. I have this map:
val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7);
However it only works with DecisionTree.trainClassifier
method. I can't use this method because it accepts different arguments than the one I have... I would really want to be able to use the DecisionTreeClassifie
r with categorical features treated properly.
Thank your for your help!
zero323
You're mixing two different APIs which take different approach to categorical data:
RDD
basedo.a.s.mllib
which provides required metadata by passingcategoricalFeaturesInfo
map.Dataset
(DataFrame
)o.a.s.ml
which is using column metadata to determine variable types. If you correctly useML
transformers to create features this should be handled automatically for you, otherwise you'll have to provide metadata manually.
来源:https://stackoverflow.com/questions/38881853/using-categoricalfeaturesinfo-with-decisiontreeclassifier-method-in-spark