Pandas dataframe merge

Deadly 提交于 2020-01-16 14:10:00

问题


I have a concatenated pandas dataframe from 4 dataframes like this:

In [121]: all
Out[121]:
       E  H  N  S
   102P    Y  NaN  NaN  NaN
   103R    Y  NaN  NaN  NaN
   102P  NaN  NaN    Y  NaN
   103R  NaN  NaN    Y  NaN
   109F  NaN  NaN    Y  NaN
   103R  NaN    Y  NaN  NaN
   109F  NaN    Y  NaN  NaN
   102P  NaN  NaN  NaN    Y
   103R  NaN  NaN  NaN    Y
   109F  NaN  NaN  NaN    Y

I want to consolidate this into a dataframe like this:

        E   H   N   S
  102P  Y  NAN  Y   Y
  103R  Y   Y   Y   Y
  109F NAN  Y   Y   Y

How can I merge them based on all.index?


回答1:


Do a groupby on the index (I presume from the data you posted that the values 102P... are in the index). And count the values. That will return a DataFrame with zeros and ones. Just replace them with appropriate values.

>>> ndf = df.groupby(level=0).count()
>>> ndf[ndf == 1] = 'Y'
>>> ndf[ndf == 0] = np.nan
>>> ndf
         E    H  N  S
label                
102P     Y  NaN  Y  Y
103R     Y    Y  Y  Y
109F   NaN    Y  Y  Y

If you have repetitions then just change the condition from ndf[ndf == 1] to ndf[ndf > 0].

But why are you concatenating the data frames instead of combining them? Example:

>>> df1
      E   H   N   S
0                  
102P  Y NaN NaN NaN
103R  Y NaN NaN NaN
>>> df2
       E   H  N   S
0                  
102P NaN NaN  Y NaN
103R NaN NaN  Y NaN
109F NaN NaN  Y NaN

...

>>> reduce(lambda first, second: first.combine_first(second),
           [df1, df2, df3, df4], pd.DataFrame())
        E    H  N  S
0                   
102P    Y  NaN  Y  Y
103R    Y    Y  Y  Y
109F  NaN    Y  Y  Y


来源:https://stackoverflow.com/questions/18922934/pandas-dataframe-merge

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!