How to approach machine learning problems with high dimensional input space?

后端未结

关注

 5  1309

夕颜 2020-12-13 01:25

How should I approach a situtation when I try to apply some ML algorithm (classification, to be more specific, SVM in particular) over some high dimensional input, and the r

5条回答

情深已故 (楼主)

2020-12-13 01:27
I would approach the problem as follows:

What do you mean by "the results I get are not quite satisfactory"?

If the classification rate on the training data is unsatisfactory, it implies that either
- You have outliers in your training data (data that is misclassified). In this case you can try algorithms such as RANSAC to deal with it.
- Your model(SVM in this case) is not well suited for this problem. This can be diagnozed by trying other models (adaboost etc.) or adding more parameters to your current model.
- The representation of the data is not well suited for your classification task. In this case preprocessing the data with feature selection or dimensionality reduction techniques would help
If the classification rate on the test data is unsatisfactory, it implies that your model overfits the data:
- Either your model is too complex(too many parameters) and it needs to be constrained further,
- Or you trained it on a training set which is too small and you need more data
Of course it may be a mixture of the above elements. These are all "blind" methods to attack the problem. In order to gain more insight into the problem you may use visualization methods by projecting the data into lower dimensions or look for models which are suited better to the problem domain as you understand it (for example if you know the data is normally distributed you can use GMMs to model the data ...)
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...