Modify Value of Pandas dataframe Groups

问题

We have the following dataframe (df) that has 3 columns. The goal is to make sure that the summation of "Load" for each group based on IDs is equal to 1.

pd.DataFrame({'ID':['AEC','AEC','CIZ','CIZ','CIZ'],'Load':[0.2093275,0.5384086,0.1465657,0.7465657,0.1465657]})

Num   ID  Load
1   AEC 0.2093275
2   AEC 0.5384086
3   CIZ 0.1465657
4   CIZ 0.7465657
5   CIZ 0.1465657

If a group's total load is less or more than 1, we want to add or subtract from only one member of the group to make the summation equal 1 without adding extra rows to the dataframe (just by modifying the values). How can we do that?

Thank you all in advance.

回答1:

I am using resample random pick one value from each group to make the change

df['New']=(1-df.groupby('ID').Load.transform('sum'))

df['Load']=df.Load.add(df.groupby('ID').New.apply(lambda x : x.sample(1)).reset_index('ID',drop=True)).fillna(df.Load)

df.drop('New',1)
Out[163]: 
   Num   ID      Load
0    1  AEC  0.209327
1    2  AEC  0.790673
2    3  CIZ  0.146566
3    4  CIZ  0.746566
4    5  CIZ  0.106869

Check

df.drop('New',1).groupby('ID').Load.sum()
Out[164]: 
ID
AEC    1.0
CIZ    1.0
Name: Load, dtype: float64

回答2:

You can use drop_duplicates to keep the first record in each group and then change the Load value so that its group Load sum is 1.

df.loc[df.ID.drop_duplicates().index, 'Load'] -= df.groupby('ID').Load.sum().subtract(1).values

df
Out[92]: 
   Num   ID      Load
0    1  AEC  0.461591
1    2  AEC  0.538409
2    3  CIZ  0.106869
3    4  CIZ  0.746566
4    5  CIZ  0.146566

df.groupby('ID').Load.sum()
Out[93]: 
ID
AEC    1.0
CIZ    1.0
Name: Load, dtype: float64

来源：https://stackoverflow.com/questions/48533538/modify-value-of-pandas-dataframe-groups

标签

python

pandas

pandas-groupby