How to create new values in a pandas dataframe column based on values from another column

前端 未结 2 1516
无人及你
无人及你 2020-12-17 00:24

I have a pandas dataframe of values I read in from a csv file. I have a column labeled \'SleepQuality\' and the values are float from 0.0 - 100.0. I want to create a new col

相关标签:
2条回答
  • 2020-12-17 01:14

    That's basically a binning operation. As such two tools could be used here.

    Using np.searchsorted -

    bins = np.arange(50,100,10)
    df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)
    

    Using np.digitize -

    df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)
    

    Sample output -

    In [866]: df
    Out[866]: 
        SleepQuality  SleepQualityGroup
    0           80.4                  4
    1           90.1                  5
    2           66.4                  2
    3           50.3                  1
    4           86.2                  4
    5           75.4                  3
    6           45.7                  0
    7           91.5                  5
    8           61.3                  2
    9           54.0                  1
    10          58.2                  1
    

    Runtime test -

    In [921]: df
    Out[921]: 
        SleepQuality  SleepQualityGroup
    0           80.4                  4
    1           90.1                  5
    2           66.4                  2
    3           50.3                  1
    4           86.2                  4
    5           75.4                  3
    6           45.7                  0
    7           91.5                  5
    8           61.3                  2
    9           54.0                  1
    10          58.2                  1
    
    In [922]: df = pd.concat([df]*10000,axis=0)
    
    # @Dark's soln using pd.cut
    In [923]: %timeit df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])
    1000 loops, best of 3: 1.04 ms per loop
    
    In [926]: %timeit df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)
    1000 loops, best of 3: 591 µs per loop
    
    In [927]: %timeit df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)
    1000 loops, best of 3: 538 µs per loop
    
    0 讨论(0)
  • 2020-12-17 01:21

    Use pd.cut i.e

    df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])
    

    Output:

            SleepQuality  SleepQualityGroup new
    0           80.4                  4   4
    1           90.1                  5   5
    2           66.4                  2   2
    3           50.3                  1   1
    4           86.2                  4   4
    5           75.4                  3   3
    6           45.7                  0   0
    7           91.5                  5   5
    8           61.3                  2   2
    9           54.0                  1   1
    10          58.2                  1   1
    

    0 讨论(0)
提交回复
热议问题