Add new columns to a pandas df after filtering

社会主义新天地 提交于 2019-12-11 07:32:00

问题


I have a df that contains information about various places.

import pandas as pd

d = ({
    'C' : ['08:00:00','XX','08:10:00','XX','08:41:42','XX','08:50:00','XX', '09:00:00', 'XX','09:15:00','XX','09:21:00','XX','09:30:00','XX','09:40:00','XX'],
    'D' : ['Home','','Home','','Away','','Home','','Away','','Home','','Home','','Away','','Home',''],
    'E' : ['Num:','','Num:','','Num:','','Num:','','Num:', '','Num:','','Num:','','Num:', '','Num:', ''],
    'F' : ['1','','1','','1','','1','','1', '','2','','2','','1', '','2',''],   
    'A' : ['A','','A','','A','','A','','A','','A','','A','','A','','A',''],           
    'B' : ['Stop','','Res','','Stop','','Start','','Res','','Stop','','Res','','Start','','Start','']
    })

df = pd.DataFrame(data=d)

I want to export that data into their respective places, which are labelled in Column D. I also want to add new columns based off functions labelled in Column B.

df['C'] = pd.to_timedelta(df['C'], errors="coerce").dt.total_seconds()

incl = ['Home', 'Away']    

for k, g in df[df.D.isin(incl)].groupby('D'):
    Stop = g.loc[df['B'] == 'Stop'].reset_index()['C']
    Start = g.loc[df['B'] == 'Start'].reset_index()['C']
    Res = g.loc[df['B'] == 'Res'].reset_index()['C']

    g['Start_diff'] = Start - Stop
    g['Res_diff'] = Start - Res

The problem is these functions occur multiple times, which are labelled in Column F. So if we look at the export for Home we get the diff for the first time in Column F.

Output:

    A   B       C       D       E       F   Start_diff  Res_diff
0   A   Stop    28800   Home    Num:    1   3000        2400
2   A   Res     29400   Home    Num:    1       
6   A   Start   31800   Home    Num:    1       
10  A   Stop    33300   Home    Num:    2       
12  A   Res     33660   Home    Num:    2       
16  A   Start   34800   Home    Num:    2       

Whereas I'm hoping the intended output would be:

    A   B       C       D       E       F   Start_diff  Res_diff
0   A   Stop    28800   Home    Num:    1   3000        2400
2   A   Res     29400   Home    Num:    1       
6   A   Start   31800   Home    Num:    1       
10  A   Stop    33300   Home    Num:    2   1500        1200    
12  A   Res     33660   Home    Num:    2       
16  A   Start   34800   Home    Num:    2       

I have tried to alter for k, g in df[df.D.isin(incl)].groupby('D'): to for k, g in df[df.D.isin(incl)].groupby('D').F.nunique():

But I get an error TypeError: 'numpy.int64' object is not iterable


回答1:


I believe need custom function with groupby by D and F columns with replace duplicated values by mask:

def f(g):
    Stop = g.loc[df['B'] == 'Stop', 'C']
    Start = g.loc[df['B'] == 'Start', 'C']
    Res = g.loc[df['B'] == 'Res', 'C']
    g['Start_diff'] = Start.values[0] - Stop.values[0]
    g['Res_diff'] = Start.values[0] - Res.values[0]

    return (g)

df = df[df.D.isin(incl)].groupby(['D', 'F']).apply(f)

df[['Start_diff', 'Res_diff']] = df[['Start_diff', 'Res_diff']].mask(df.duplicated(['D','F']))
print (df)
          C     D     E  F  A      B  Start_diff  Res_diff
0   28800.0  Home  Num:  1  A   Stop      3000.0    2400.0
2   29400.0  Home  Num:  1  A    Res         NaN       NaN
4   31302.0  Away  Num:  1  A   Stop      2898.0    1800.0
6   31800.0  Home  Num:  1  A  Start         NaN       NaN
8   32400.0  Away  Num:  1  A    Res         NaN       NaN
10  33300.0  Home  Num:  2  A   Stop      1500.0    1140.0
12  33660.0  Home  Num:  2  A    Res         NaN       NaN
14  34200.0  Away  Num:  1  A  Start         NaN       NaN
16  34800.0  Home  Num:  2  A  Start         NaN       NaN


来源:https://stackoverflow.com/questions/50831061/add-new-columns-to-a-pandas-df-after-filtering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!