Combining columns of dataframe [duplicate]

问题

I have dataframe like this:

   c1   c2   c3
0   a   NaN  NaN
1  NaN   b   NaN
2  NaN  NaN   c
3  NaN   b   NaN
4   a   NaN  NaN

I want to combine these three columns like this :

Here is the code to make the above data frame:

a = pd.DataFrame({
    'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
    'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
    'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})

回答1:

bfilling is one option:

a.bfill(axis=1).iloc[:,0]

0    a
1    b
2    c
3    b
4    a
Name: c1, dtype: object

Another one is a simple stack, gets rid of NaNs.

a.stack().reset_index(level=1, drop=True) 


0    a
1    b
2    c
3    b
4    a
dtype: object

Another interesting option you don't see everyday is using the power of NumPy. Here's a modified version of Divakar's justify utility that works with object DataFrames.

justify(a.to_numpy(), invalid_val=np.nan)[:,0]
# array(['a', 'b', 'c', 'b', 'a'], dtype=object)

# as a Series
pd.Series(justify(a.to_numpy(), invalid_val=np.nan)[:,0], index=a.index)

0    a
1    b
2    c
3    b
4    a
dtype: object

回答2:

You could try this:

import pandas as pd
import numpy as np
a = pd.DataFrame({
    'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
    'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
    'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})

newdf=pd.DataFrame({'c4':a.fillna('').values.sum(axis=1)})

Output:

newdf

  c4
0  a
1  b
2  c
3  b
4  a

I just see this option retrieved from jpp's answer, where jpp take advantage of the fact that np.nan != np.nan and uses a list comprehension, maybe it could be the fastest way:

newdf=pd.DataFrame({'c4':[i  for row in a.values for i in row if i == i]})
print(newdf)

来源：https://stackoverflow.com/questions/62751210/combining-columns-of-dataframe

标签

python

pandas

dataframe