How do I run the Spark decision tree with a categorical feature set using Scala?

后端未结

关注

 3  1882

你的背包 2021-02-20 18:37

I have a feature set with a corresponding categoricalFeaturesInfo: Map[Int,Int]. However, for the life of me I cannot figure out how I am supposed to get the DecisionTree class

3条回答

没有蜡笔的小新 (楼主)

2021-02-20 19:17
You can first transform categories to numbers, then load data as if all features are numerical.

When you build a decision tree model in Spark, you just need to tell spark which features are categorical and also the feature's arity (the number of distinct categories of that feature) by specifying a map Map[Int, Int]() from feature indices to its arity.

For example if you have data as:
```
1,a,add
2,b,more
1,c,thinking
3,a,to
1,c,me
```
You can first transform data into numerical format as:
```
1,0,0
2,1,1
1,2,2
3,0,3
1,2,4
```
In that format you can load data to Spark. Then if you want to tell Spark the second and the third columns are categorical, you should create a map:
```
categoricalFeaturesInfo = Map[Int, Int]((1,3),(2,5))
```
The map tells us that feature with index 1 has arity 3, and feature with index 2 has artity 5. They will be considered as categorical when we build a decision tree model passing that map as a parameter of the training function:
```
val model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins)
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...