Combining columns of dataframe [duplicate]

南楼画角 提交于 2021-02-05 11:26:05

问题


I have dataframe like this:

   c1   c2   c3
0   a   NaN  NaN
1  NaN   b   NaN
2  NaN  NaN   c
3  NaN   b   NaN
4   a   NaN  NaN

I want to combine these three columns like this :

    c4
0    a
1    b
2    c
3    b
4    a

Here is the code to make the above data frame:

a = pd.DataFrame({
    'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
    'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
    'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})

回答1:


bfilling is one option:

a.bfill(axis=1).iloc[:,0]

0    a
1    b
2    c
3    b
4    a
Name: c1, dtype: object

Another one is a simple stack, gets rid of NaNs.

a.stack().reset_index(level=1, drop=True) 


0    a
1    b
2    c
3    b
4    a
dtype: object

Another interesting option you don't see everyday is using the power of NumPy. Here's a modified version of Divakar's justify utility that works with object DataFrames.

justify(a.to_numpy(), invalid_val=np.nan)[:,0]
# array(['a', 'b', 'c', 'b', 'a'], dtype=object)

# as a Series
pd.Series(justify(a.to_numpy(), invalid_val=np.nan)[:,0], index=a.index)

0    a
1    b
2    c
3    b
4    a
dtype: object



回答2:


You could try this:

import pandas as pd
import numpy as np
a = pd.DataFrame({
    'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
    'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
    'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})

newdf=pd.DataFrame({'c4':a.fillna('').values.sum(axis=1)})

Output:

newdf

  c4
0  a
1  b
2  c
3  b
4  a

I just see this option retrieved from jpp's answer, where jpp take advantage of the fact that np.nan != np.nan and uses a list comprehension, maybe it could be the fastest way:

newdf=pd.DataFrame({'c4':[i  for row in a.values for i in row if i == i]})
print(newdf)


来源:https://stackoverflow.com/questions/62751210/combining-columns-of-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!