Convert a python dataframe with multiple rows into one row using python pandas?

匿名 (未验证) 提交于 2019-12-03 03:03:02

问题:

Having the following dataframe,

df = pd.DataFrame({'device_id' : ['0','0','1','1','2','2'],                'p_food'    : [0.2,0.1,0.3,0.5,0.1,0.7],                'p_phone'   : [0.8,0.9,0.7,0.5,0.9,0.3]               }) print(df) 

output:

  device_id  p_food  p_phone 0         0     0.2      0.8 1         0     0.1      0.9 2         1     0.3      0.7 3         1     0.5      0.5 4         2     0.1      0.9 5         2     0.7      0.3 

How to achieve this transformation?

df2 = pd.DataFrame({'device_id' : ['0','1','2'],                    'p_food_1'    : [0.2,0.3,0.1],                    'p_food_2'    : [0.1,0.5,0.7],                    'p_phone_1'   : [0.8,0.7,0.9],                                        'p_phone_2'   : [0.9,0.5,0.3]                   }) print(df2) 

Output:

  device_id  p_food_1  p_food_2  p_phone_1  p_phone_2 0         0       0.2       0.1        0.8        0.9 1         1       0.3       0.5        0.7        0.5 2         2       0.1       0.7        0.9        0.3 

I try to achieve it use groupby,apply,agg...
But I still can't achieve this transformation.

Update
My final Code:

df.drop_duplicates('device_id', keep='first').merge(df.drop_duplicates('device_id', keep='last'),on='device_id') 

I appreciated su79eu7k's and A-Za-z's time and effort.
Words are not enough to express my gratitude.

回答1:

If you are still looking for an answer using groupby

df = df.groupby('device_id')['p_food', 'p_phone'].apply(lambda x: pd.DataFrame(x.values)).unstack().reset_index() df.columns = df.columns.droplevel() df.columns = ['device_id','p_food_1', 'p_food_2', 'p_phone_1','p_phone_2'] 

You get

    device_id   p_food_1    p_food_2    p_phone_1   p_phone_2 0   0           0.2         0.1         0.8         0.9 1   1           0.3         0.5         0.7         0.5 2   2           0.1         0.7         0.9         0.3 


回答2:

df_m = df.drop_duplicates('device_id', keep='first')\          .merge(df, on='device_id')\          .drop_duplicates('device_id', keep='last')\          [['device_id', 'p_food_x', 'p_food_y', 'p_phone_x', 'p_phone_y']]\          .reset_index(drop=True)  print(df_m)    device_id  p_food_x  p_food_y  p_phone_x  p_phone_y 0         0       0.2       0.1        0.8        0.9 1         1       0.3       0.5        0.7        0.5 2         2       0.1       0.7        0.9        0.3 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!