Given is a simple CSV file:
A,B,C
Hello,Hi,0
Hola,Bueno,1
Obviously the real dataset is far more complex than this, but this one reproduces
I had a similar issue and found that pandas.get_dummies() solved the problem. Specifically, it splits out columns of categorical data into sets of boolean columns, one new column for each unique value in each input column. In your case, you would replace train_x = test[cols]
with:
train_x = pandas.get_dummies(test[cols])
This transforms the train_x Dataframe into the following form, which RandomForestClassifier can accept:
C A_Hello A_Hola B_Bueno B_Hi
0 0 1 0 0 1
1 1 0 1 1 0