How to Merge Columns in Rows in a Dataframe that fulfill a Condition, while deleting the Rows

喜你入骨 提交于 2021-01-28 12:36:24

问题


I dont think I can solve it with groupby() or agg() like in these (Question1, Question2)'s.

I have a pandas.DataFrame that has one identifier column (ID_Code) and some information columns(information 1 and information 2). I need to aggregate some of the identifiers. Meaning some have to be deleted and their information has to be added into specific other rows.

To illustrate my problem here is something I made up:

import pandas as pd

inp = [{'ID_Code':1,'information 1':list(x * 3 for x in range(2, 5)),'information 2':list(x / 3 for x in range(2, 5))},
       {'ID_Code':2,'information 1':list(x * 0.5 for x in range(2, 5)),'information 2':list(x / 2 for x in range(2, 5))},
       {'ID_Code':3,'information 1':list(x * 0.2 for x in range(25, 29)),'information 2':list(x / 1 for x in range(2, 5))},
       {'ID_Code':4,'information 1':list(x * 0.001 for x in range(102, 105)),'information 2':list(x / 12 for x in range(2, 5))},
       {'ID_Code':5,'information 1':list(x * 12 for x in range(15, 17)),'information 2':list(x / 24 for x in range(2, 5))},
       {'ID_Code':6,'information 1':list(x * 42 for x in range(2, 9)),'information 2':list(x / 48 for x in range(2, 5))},
       {'ID_Code':7,'information 1':list(x * 23 for x in range(1, 2)),'information 2':list(x / 96 for x in range(2, 5))},
       {'ID_Code':8,'information 1':list(x * 7.8 for x in range(8, 11)),'information 2':list(x / 124 for x in range(2, 5))}]

df = pd.DataFrame(inp)

print(df)
Out:
       ID_Code                                                    information 1   information 2
    0        1                                                       [6, 9, 12]   [0.6666666666666666, 1.0, 1.3333333333333333]
    1        2                                                  [1.0, 1.5, 2.0]   [1.0, 1.5, 2.0]
    2        3                              [5.0, 5.2, 5.4, 5.6000000000000005]   [2.0, 3.0, 4.0]
    3        4  [0.10200000000000001, 0.10300000000000001, 0.10400000000000001]   [0.16666666666666666, 0.25, 0.3333333333333333]
    4        5                                                       [180, 192]   [0.08333333333333333, 0.125, 0.16666666666666666]
    5        6                               [84, 126, 168, 210, 252, 294, 336]   [0.041666666666666664, 0.0625, 0.08333333333333333]
    6        7                                                             [23]   [0.041666666666666664, 0.0625, 0.08333333333333333]
    7        8                                               [62.4, 70.2, 78.0]   [0.016129032258064516, 0.024193548387096774, 0.03225806451612903]

What do I need to do, if I want to get rid of ID_Code = 1 and store it's information in ID_Code = 3, and get rid of ID_Code = 5 and ID_Code = 7 and store that information in ID_Code = 2, so that the DataFrame looks like this:

   ID_Code                                                    information 1   information 2
0        2                                    [180, 192, 23, 1.0, 1.5, 2.0]   [0.08333333333333333, 0.125, 0.16666666666666666, 0.041666666666666664, 0.0625, 0.08333333333333333, 1.0, 1.5, 2.0]
1        3                    [6, 9, 12, 5.0, 5.2, 5.4, 5.6000000000000005]   [2.0, 3.0, 4.0]
2        4  [0.10200000000000001, 0.10300000000000001, 0.10400000000000001]   [0.6666666666666666, 1.0, 1.3333333333333333, 0.16666666666666666, 0.25, 0.3333333333333333]
3        6                               [84, 126, 168, 210, 252, 294, 336]   [0.041666666666666664, 0.0625, 0.08333333333333333]
4        8                                               [62.4, 70.2, 78.0]   [0.016129032258064516, 0.024193548387096774, 0.03225806451612903]

回答1:


you could conditionally change your df['ID_Code'] then sum the columns.

col = 'ID_Code'
cond = [df[col].eq(1),
       df[col].isin([5,7])]

outputs = [3,2]

df[col] = np.select(cond,outputs,default=df[col])

df1 = df.groupby(col).sum()

print(df1)


                                             information 1  \
ID_Code                                                      
2                            [1.0, 1.5, 2.0, 180, 192, 23]   
3            [6, 9, 12, 5.0, 5.2, 5.4, 5.6000000000000005]   
4        [0.10200000000000001, 0.10300000000000001, 0.1...   
6                       [84, 126, 168, 210, 252, 294, 336]   
8                                       [62.4, 70.2, 78.0]   

                                             information 2  
ID_Code                                                     
2        [1.0, 1.5, 2.0, 0.08333333333333333, 0.125, 0....  
3        [0.6666666666666666, 1.0, 1.3333333333333333, ...  
4          [0.16666666666666666, 0.25, 0.3333333333333333]  
6        [0.041666666666666664, 0.0625, 0.0833333333333...  
8        [0.016129032258064516, 0.024193548387096774, 0...  



回答2:


You can set ID_Code as index, and update with list comprehension:

df=df.set_index('ID_Code')
df.loc[3] = [x+y for x,y in zip(df.loc[1], df.loc[3])]
df = df.drop(1)


来源:https://stackoverflow.com/questions/62431862/how-to-merge-columns-in-rows-in-a-dataframe-that-fulfill-a-condition-while-dele

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!