问题
I have to use this code:
val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth);
I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures as numerical. I have this map:
val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7);
However it only works with DecisionTree.trainClassifier method. I can't use this method because it accepts different arguments than the one I have... I would really want to be able to use the DecisionTreeClassifier with categorical features treated properly.
Thank your for your help!
回答1:
You're mixing two different APIs which take different approach to categorical data:
RDDbasedo.a.s.mllibwhich provides required metadata by passingcategoricalFeaturesInfomap.Dataset(DataFrame)o.a.s.mlwhich is using column metadata to determine variable types. If you correctly useMLtransformers to create features this should be handled automatically for you, otherwise you'll have to provide metadata manually.
来源:https://stackoverflow.com/questions/38881853/using-categoricalfeaturesinfo-with-decisiontreeclassifier-method-in-spark