My data consists of a mix of continuous and categorical features. Below is a small snippet of how my data looks like in the csv format (Consider it as data collected by a su
Thanks to the ideas above;
To Discretizate continuous values, you may utilize:
or
the sklearn's KBinsDiscretizer function (with parameter encode
set to ‘ordinal’)
strategy
= uniform
will discretize in the same manner as pd.cutstrategy
= quantile
will discretize in the same manner as pd.qcut functionSince examples for cut/qcut are provided in previous answers, here let's go on with a clean example on KBinsDiscretizer:
import numpy as np
from sklearn.preprocessing import KBinsDiscretizer
A = np.array([[24,0.2],[35,0.3],[74,0.4], [96,0.5],[2,0.6],[39,0.8]])
print(A)
# [[24. 0.2]
# [35. 0.3]
# [74. 0.4]
# [96. 0.5]
# [ 2. 0.6]
# [39. 0.8]]
enc = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
enc.fit(A)
print(enc.transform(A))
# [[0. 0.]
# [1. 0.]
# [2. 1.]
# [2. 1.]
# [0. 2.]
# [1. 2.]]
As shown in the output, each feature has been discretized into 3 bins. Hope this helped :)
Final notes:
cut versus qcut
, see this postKBinsDiscretizer(encode='onehot')
to perform one-hot encoding on that feature