All binary predictors in a classification task

问题

I am performing my analysis using R, I will be implementing four algorithms.

1. RF
2. Log Reg
3. SVM
4. LDA

I have 50 predictors and 1 target variable. All my predictors and target variable are only binary numbers 0s and 1s.

I have the following questions:

Should I convert them all into factors?
Converting them into factors, and applying RF algorithms give 100% accuracy, I am very much surprised to see that as well.
Also, for other algorithms, how should i treat my variables priorly, before feeding them into my other algorithms.

Thanks

回答1:

If you variables / predictors are categorical, then it is best to convert them to factors. Otherwise, it is likely they will be treated as numerical values.

If you are doing a classification task, then best to have the target / response variable as a factor as well.

It is also better to look at the documentation of the functions you use to make sure they will not convert factors to numerical values.

回答2:

Use adaboost...

Take a look at different kaggle kernels, especially the Mercedes one, to get the idea of implementing adaboost.

https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/kernels

The dataset is mixed of both numerical and factors and 0s,1s.

来源：https://stackoverflow.com/questions/46844180/all-binary-predictors-in-a-classification-task

标签

machine-learning

statistics

random-forest

r-caret

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!