My data consists of a mix of continuous and categorical features. Below is a small snippet of how my data looks like in the csv format (Consider it as data collected by a su
you could using pandas.cut method, like this:
bins = [0, 4, 10, 30, 45, 99999]
labels = ['Very_Low_Fare', 'Low_Fare', 'Med_Fare', 'High_Fare','Very_High_Fare']
train_orig.Fare[:10]
Out[0]:
0 7.2500
1 71.2833
2 7.9250
3 53.1000
4 8.0500
5 8.4583
6 51.8625
7 21.0750
8 11.1333
9 30.0708
Name: Fare, dtype: float64
pd.cut(train_orig.Fare, bins=bins, labels=labels)[:10]
Out[50]:
0 Low_Fare
1 Very_High_Fare
2 Low_Fare
3 Very_High_Fare
4 Low_Fare
5 Low_Fare
6 Very_High_Fare
7 Med_Fare
8 Med_Fare
9 High_Fare
Name: Fare, dtype: category
Categories (5, object): [High_Fare < Low_Fare < Med_Fare < Very_High_Fare < Very_Low_Fare]