How to create new values in a pandas dataframe column based on values from another column

前端未结

关注

 2  1516

I have a pandas dataframe of values I read in from a csv file. I have a column labeled \'SleepQuality\' and the values are float from 0.0 - 100.0. I want to create a new col

相关标签:

2条回答

清歌不尽

2020-12-17 01:14

That's basically a binning operation. As such two tools could be used here.

Using np.searchsorted -

bins = np.arange(50,100,10)
df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)

Using np.digitize -

df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)

Sample output -

In [866]: df
Out[866]: 
    SleepQuality  SleepQualityGroup
0           80.4                  4
1           90.1                  5
2           66.4                  2
3           50.3                  1
4           86.2                  4
5           75.4                  3
6           45.7                  0
7           91.5                  5
8           61.3                  2
9           54.0                  1
10          58.2                  1

Runtime test -

In [921]: df
Out[921]: 
    SleepQuality  SleepQualityGroup
0           80.4                  4
1           90.1                  5
2           66.4                  2
3           50.3                  1
4           86.2                  4
5           75.4                  3
6           45.7                  0
7           91.5                  5
8           61.3                  2
9           54.0                  1
10          58.2                  1

In [922]: df = pd.concat([df]*10000,axis=0)

# @Dark's soln using pd.cut
In [923]: %timeit df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])
1000 loops, best of 3: 1.04 ms per loop

In [926]: %timeit df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)
1000 loops, best of 3: 591 µs per loop

In [927]: %timeit df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)
1000 loops, best of 3: 538 µs per loop

0 讨论(0)

名媛妹妹

2020-12-17 01:21

Use pd.cut i.e

df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])

Output:

        SleepQuality  SleepQualityGroup new
0           80.4                  4   4
1           90.1                  5   5
2           66.4                  2   2
3           50.3                  1   1
4           86.2                  4   4
5           75.4                  3   3
6           45.7                  0   0
7           91.5                  5   5
8           61.3                  2   2
9           54.0                  1   1
10          58.2                  1   1

0 讨论(0)