How to approach machine learning problems with high dimensional input space?

后端 未结 5 1309
夕颜
夕颜 2020-12-13 01:25

How should I approach a situtation when I try to apply some ML algorithm (classification, to be more specific, SVM in particular) over some high dimensional input, and the r

5条回答
  •  情深已故
    2020-12-13 01:27

    I would approach the problem as follows:

    What do you mean by "the results I get are not quite satisfactory"?

    If the classification rate on the training data is unsatisfactory, it implies that either

    • You have outliers in your training data (data that is misclassified). In this case you can try algorithms such as RANSAC to deal with it.
    • Your model(SVM in this case) is not well suited for this problem. This can be diagnozed by trying other models (adaboost etc.) or adding more parameters to your current model.
    • The representation of the data is not well suited for your classification task. In this case preprocessing the data with feature selection or dimensionality reduction techniques would help

    If the classification rate on the test data is unsatisfactory, it implies that your model overfits the data:

    • Either your model is too complex(too many parameters) and it needs to be constrained further,
    • Or you trained it on a training set which is too small and you need more data

    Of course it may be a mixture of the above elements. These are all "blind" methods to attack the problem. In order to gain more insight into the problem you may use visualization methods by projecting the data into lower dimensions or look for models which are suited better to the problem domain as you understand it (for example if you know the data is normally distributed you can use GMMs to model the data ...)

提交回复
热议问题