Compare column values in two or three dataframe and merge

瘦欲@ 提交于 2019-12-11 12:14:05

问题


I have already checked few earlier questions and I have some what unique problem. I have three excel file and I load them into three different dataframe. Basically I have to add contents of excel_1 and excel_2 and compare the contents against excel_3

Example data: (excel_1 sales Territory#1)

  Name  Year    Item    sales_Amount1
  A1        1.2019  Badam   2
  A1        1.2019  Badam   10
  A1        1.2019  carrot  8
  A1        1.2019  carrot  10
  A2        1.2019  Badam   10
  A2        1.2019  Badam   20
  A3        2.2019  soap    3
  A1        2.2019  soap    1

Example data: (excel_2 sales Territory#2)

  Name  Year    Item    sales_Amount2
  A1        1.2019  Badam   60
  A1        1.2019  Badam   10
  A2        1.2019  Badam   40
  A2        1.2019  Badam   1
  A3        2.2019  soap    1
  A3        2.2019  soap    10
  A1        2.2019  soap    10

excel_3 target also has similar data

  Name  Year    Item    target_Amount
  A1        1.2019  Badam   100
  A2        1.2019  Badam   30
  A1        1.2019  carrot  200
  A3        2.2019  soap    3

Basically I have to add sales amount 1 and 2 and compare the results against target data. I would like to arrive a single csv with column details as below. Hence I can do the calculations as I mentioned.

  Name   Year    Item   sales_Amount1  Sales_Amount2  target_Amount
  A1     1.2019  Badam  12              70             100
  A1     1.2019  carrot 18              0              200
  A2     1.2019  Badam  30              41             30
  A1     2.2019  soap   1               10             0
  A3     2.1019  soap   3               11             3
df1 = pd.read_excel(r"excel_1.xlxs")
sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})

df2 = pd.read_excel(r"excel_2.xlxs")
sum_sales1 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})

df3 = pd.read_excel(r"excel_3.xlxs")
sum_sales1 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})

Basically I loaded each csv into one dataframe and then groupby to get the aggregated sum of each items as shown above. It's bit tricky now to compare and merge all three dataframe as shown above with the following columns

Name   Year    Item   sales_Amount1  Sales_Amount2  target_Amount

Is this approach right towards comparing three dataframes and merge them as one or should I move towards pivot table. It's bit confusing which one to chose before I proceed. Thanks.


回答1:


Use concat with DataFrame.fillna:

sum_sales1 = df1.groupby(['Name','Year', 'Item']).agg({'sales_Amount1': 'sum'})
sum_sales2 = df2.groupby(['Name','Year', 'Item']).agg({'sales_Amount2': 'sum'})
sum_sales3 = df3.groupby(['Name','Year', 'Item']).agg({'target_Amount': 'sum'})

df = (pd.concat([sum_sales1, sum_sales2, sum_sales3],
        axis=1).fillna(0).astype(int).reset_index())
print (df)
  Name    Year    Item  sales_Amount1  sales_Amount2  target_Amount
0   A1  1.2019   Badam             12             70            100
1   A1  1.2019  carrot             18              0            200
2   A1  2.2019    soap              1             10              0
3   A2  1.2019   Badam             30             41             30
4   A3  2.2019    soap              3             11              3

If last column is necessary aggregate - then use list comprehension:

dfs = [df1, df2, df3]
dfs = [x.groupby(['Name','Year', 'Item']).agg({x.columns[-1]: 'sum'}) for x in dfs]

df = pd.concat(dfs, axis=1).fillna(0).astype(int).reset_index()
print (df)
  Name    Year    Item  sales_Amount1  sales_Amount2  target_Amount
0   A1  1.2019   Badam             12             70            100
1   A1  1.2019  carrot             18              0            200
2   A1  2.2019    soap              1             10              0
3   A2  1.2019   Badam             30             41             30
4   A3  2.2019    soap              3             11              3


来源:https://stackoverflow.com/questions/58603076/compare-column-values-in-two-or-three-dataframe-and-merge

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!