问题
I have dataframe like this:
c1 c2 c3
0 a NaN NaN
1 NaN b NaN
2 NaN NaN c
3 NaN b NaN
4 a NaN NaN
I want to combine these three columns like this :
c4
0 a
1 b
2 c
3 b
4 a
Here is the code to make the above data frame:
a = pd.DataFrame({
'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})
回答1:
bfill
ing is one option:
a.bfill(axis=1).iloc[:,0]
0 a
1 b
2 c
3 b
4 a
Name: c1, dtype: object
Another one is a simple stack, gets rid of NaNs.
a.stack().reset_index(level=1, drop=True)
0 a
1 b
2 c
3 b
4 a
dtype: object
Another interesting option you don't see everyday is using the power of NumPy. Here's a modified version of Divakar's justify utility that works with object DataFrames.
justify(a.to_numpy(), invalid_val=np.nan)[:,0]
# array(['a', 'b', 'c', 'b', 'a'], dtype=object)
# as a Series
pd.Series(justify(a.to_numpy(), invalid_val=np.nan)[:,0], index=a.index)
0 a
1 b
2 c
3 b
4 a
dtype: object
回答2:
You could try this:
import pandas as pd
import numpy as np
a = pd.DataFrame({
'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})
newdf=pd.DataFrame({'c4':a.fillna('').values.sum(axis=1)})
Output:
newdf
c4
0 a
1 b
2 c
3 b
4 a
I just see this option retrieved from jpp's answer, where jpp take advantage of the fact that np.nan != np.nan
and uses a list comprehension, maybe it could be the fastest way:
newdf=pd.DataFrame({'c4':[i for row in a.values for i in row if i == i]})
print(newdf)
来源:https://stackoverflow.com/questions/62751210/combining-columns-of-dataframe