I want to encode 3 categorical features out of 10 features in my datasets. I use preprocessing from sklearn.preprocessing to do so as the following:
<
from the documentation:
categorical_features : “all” or array of indices or mask
Specify what features are treated as categorical.
‘all’ (default): All features are treated as categorical.
array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
column names of pandas dataframe won't work. if you categorical features are column numbers 0, 2 and 6 use :
from sklearn import preprocessing
cat_features = [0, 2, 6]
enc = preprocessing.OneHotEncoder(categorical_features=cat_features)
enc.fit(dataset.values)
It must also be noted that if these categorical features are not label encoded, you need to use LabelEncoder on these features before using OneHotEncoder