How to implode(reverse of pandas explode) based on a column

问题

I have a dataframe df like below

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10       1        
1   Grocery     5399            MON               20       2       
2   Grocery     5399            TUE               30       3        
3   Grocery     5399            WED               40       4

I want to implode( combine Applicable_days from multiple rows into single row like below) and get the average case and delivery per config_id

  NETWORK       config_id       APPLICABLE_DAYS      Avg_Cases    Avg_Delivery 
0   Grocery     5399            SUN,MON,TUE,WED         90           10

using the groupby on network,config_id i can get the avg_cases and avg_delivery like below.

df.groupby(['network','config_id']).agg({'case':'mean','delivery':'mean'})

But How do i be able to join APPLICABLE_DAYS while performing this aggregation?

回答1:

If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a sting in Solution #2:

Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:

df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399  [SUN, MON, TUE, WED]         25           2.5

Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:

 df = (df.groupby(['NETWORK','config_id'])
      .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
      .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
      .reset_index())
df
Out[1]: 
   NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
0  Grocery       5399       SUN,MON,TUE,WED         25           2.5

If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.

回答2:

Your results look more like a sum, than average; The solution below uses named aggregation :

    df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Total_Cases=("Case", "sum"),
    Total_Delivery=("Delivery", "sum"),
)

                        APPLICABLE_DAYS       Total_Cases   Total_Delivery
NETWORK config_id           
Grocery 5399                SUN,MON,TUE,WED           100      10

If it is the mean, then you can change the 'sum' to 'mean' :

df.groupby(["NETWORK", "config_id"]).agg(
    APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
    Avg_Cases=("Case", "mean"),
    Avg_Delivery=("Delivery", "mean"),
)

                    APPLICABLE_DAYS   Avg_Cases Avg_Delivery
NETWORK config_id           
Grocery 5399         SUN,MON,TUE,WED      25      2.5

来源：https://stackoverflow.com/questions/64235312/how-to-implodereverse-of-pandas-explode-based-on-a-column

标签

pandas

numpy

explode

implode