Python: Sum values in DataFrame if other values match between DataFrames

回眸只為那壹抹淺笑 提交于 2020-12-26 03:58:57

问题


I have two dataframes of different length like those:

DataFrame A:

FirstName    LastName
Adam         Smith
John         Johnson

DataFrame B:

First        Last        Value
Adam         Smith       1.2
Adam         Smith       1.5
Adam         Smith       3.0
John         Johnson     2.5

Imagine that what I want to do is to create a new column in "DataFrame A" summing all the values with matching last names, so the output in "A" would be:

FirstName    LastName    Sums
Adam         Smith       5.7
John         Johnson     2.5

If I were in Excel, I'd use

=SUMIF(dfB!B:B, B2, dfB!C:C)

In Python I've been trying multiple solutions but using both np.where, df.sum(), dropping indexes etc., but I'm lost. Below code is returning "ValueError: Can only compare identically-labeled Series objects", but I don't think it's written correctly anyways.

df_a['Sums'] = df_a[df_a['LastName'] == df_b['Last']].sum()['Value']

Huge thanks in advance for any help.


回答1:


Use boolean indexing with Series.isin for filtering and then aggregate sum:

df = (df_b[df_b['Last'].isin(df_a['LastName'])]
           .groupby(['First','Last'], as_index=False)['Value']
           .sum())

If want match both, first and last name:

df = (df_b.merge(df_a, left_on=['First','Last'], right_on=['FirstName','LastName'])
           .groupby(['First','Last'], as_index=False)['Value']
           .sum())



回答2:


df_b_a = (pd.merge(df_b, df_a, left_on=['FirstName', 'LastName'], right_on=['First', 'Last'], how='left')
                .groupby(by=['First', 'Last'], as_index=False)['Value'].sum())

print(df_b_a)

    First   Last    Value
0   Adam    Smith   5.7
1   John    Johnson     2.5



回答3:


Use DataFrame.merge + DataFrame.groupby:

new_df=( dfa.merge(dfb.groupby(['First','Last'],as_index=False).Value.sum() ,
                   left_on='LastName',right_on='Last',how='left')
            .drop('Last',axis=1) )
print(new_df)

to join for both columns:

new_df=( dfa.merge(dfb.groupby(['First','Last'],as_index=False).Value.sum() ,
              left_on=['FirstName','LastName'],right_on=['First','Last'],how='left')
            .drop(['First','Last'],axis=1) )
print(new_df)

Output:

  FirstName LastName  Value
0      Adam    Smith    5.7
1      John  Johnson    2.5


来源:https://stackoverflow.com/questions/59068369/python-sum-values-in-dataframe-if-other-values-match-between-dataframes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!