Error coming while using Random Forest using R

后端 未结 4 1479
自闭症患者
自闭症患者 2020-12-20 21:17

I am using a dataset containing mvar_1 as column, having names of one of 5 parties that citizen voted for last year. Other variables are just demographic variab

相关标签:
4条回答
  • 2020-12-20 21:30

    One of your mvar is a factor with more than 53 levels.

    You may have a categorical variable with lots of levels, like demographic group, and you should aggregate it into less levels to use this package. (See here for the best way of doing it)

    More likely, you have a non-categorical variable incorrectly typed as a factor. In this case you should fix it by typing your variable correctly. E.g. to get a numeric from a factor, you call as.numeric(as.character(myfactor)).

    If you don't know what a factor is, the second option is probably it. You should do a summary of data.train, this will help you see which mvar are incorrectly typed. If the mvar is typed as numeric, you will see min, max, mean, median, etc. If a numeric variable is incorrectly typed as a factor, you will not see that but you will see the number of occurence of each level.

    In any case, calling summary will help you because it shows the number of levels for each factor. The variables with >53 levels are causing the issue.

    0 讨论(0)
  • 2020-12-20 21:50

    This error occurs when you train your model with the entire dataset and not with the train data. Try implementing the model with train data and work out with test adm to perform prediction.

    0 讨论(0)
  • 2020-12-20 21:52

    I had the same problem, but solved it after seeing that I had imported the data frame with comma separators without indicating it.

    After importing the table using read.table(data, dec=",") the problem was solved!

    0 讨论(0)
  • 2020-12-20 21:52

    As antoine-sac pointed out, in my case this error was because of numeric variables appearing as factors. Only that the conversion happened by R when it was importing my (numeric) file.

    Casting the factors as numerics didn't work. But what worked was using strip.white = TRUE when importing the dataset. (I found this solution here.)

    0 讨论(0)
提交回复
热议问题