How to do discretization of continuous attributes in sklearn?

后端 未结 5 592
萌比男神i
萌比男神i 2021-01-02 08:41

My data consists of a mix of continuous and categorical features. Below is a small snippet of how my data looks like in the csv format (Consider it as data collected by a su

5条回答
  •  无人及你
    2021-01-02 09:21

    you could using pandas.cut method, like this:

    bins = [0, 4, 10, 30, 45, 99999]
    labels = ['Very_Low_Fare', 'Low_Fare', 'Med_Fare', 'High_Fare','Very_High_Fare']
    train_orig.Fare[:10]
    Out[0]: 
    0     7.2500
    1    71.2833
    2     7.9250
    3    53.1000
    4     8.0500
    5     8.4583
    6    51.8625
    7    21.0750
    8    11.1333
    9    30.0708
    Name: Fare, dtype: float64
    
    pd.cut(train_orig.Fare, bins=bins, labels=labels)[:10]
    Out[50]: 
    0          Low_Fare
    1    Very_High_Fare
    2          Low_Fare
    3    Very_High_Fare
    4          Low_Fare
    5          Low_Fare
    6    Very_High_Fare
    7          Med_Fare
    8          Med_Fare
    9         High_Fare
    Name: Fare, dtype: category
    Categories (5, object): [High_Fare < Low_Fare < Med_Fare < Very_High_Fare < Very_Low_Fare]
    

提交回复
热议问题