问题
Given a small dataset as follows:
value input
0 3 0
1 4 1
2 3 -1
3 2 1
4 3 -1
5 5 0
6 1 0
7 1 1
8 1 1
I have used the following code:
df['pct'] = df['value'] / df['value'].sum()
But I want to calculate pct by excluding input = -1, which means if input value is -1, then the correspondent values will not taken into account to sum up, neither necessary to calculate pct, for rows 2 and 4 at this case.
The expected result will like this:
value input pct
0 3 0 0.18
1 4 1 0.24
2 3 -1 NaN
3 2 1 0.12
4 3 -1 NaN
5 5 0 0.29
6 1 0 0.06
7 1 1 0.06
8 1 1 0.06
How could I do that in Pandas? Thanks.
回答1:
You can sum not matched rows by missing values to Series s by Series.where and divide only rows not matched mask filtered by DataFrame.loc, last round by Series.round:
mask = df['input'] != -1
df.loc[mask, 'pct'] = (df.loc[mask, 'value'] / df['value'].where(mask).sum()).round(2)
print (df)
value input pct
0 3 0 0.18
1 4 1 0.24
2 3 -1 NaN
3 2 1 0.12
4 3 -1 NaN
5 5 0 0.29
6 1 0 0.06
7 1 1 0.06
8 1 1 0.06
EDIT: If need replace missing values to 0 is possible use second argument in where for set values to 0, this Series is possible also sum for same output like replace to missing values:
s = df['value'].where(df['input'] != -1, 0)
df['pct'] = (s / s.sum()).round(2)
print (df)
value input pct
0 3 0 0.18
1 4 1 0.24
2 3 -1 0.00
3 2 1 0.12
4 3 -1 0.00
5 5 0 0.29
6 1 0 0.06
7 1 1 0.06
8 1 1 0.06
来源:https://stackoverflow.com/questions/62989116/filter-rows-based-one-column-value-and-calculate-percentage-of-sum-in-pandas