merging multiple columns into one columns in pandas

好久不见. 提交于 2019-12-07 14:55:41

问题


I have a dataframe called ref(first dataframe) with columns c1, c2 ,c3 and c4.

ref= pd.DataFrame([[1,3,.3,7],[0,4,.5,4.5],[2,5,.6,3]], columns=['c1','c2','c3','c4'])
print(ref)
   c1  c2   c3   c4
0   1   3  0.3  7.0
1   0   4  0.5  4.5
2   2   5  0.6  3.0

I wanted to create a new column i.e, c5 ( second dataframe) that has all the values from columns c1,c2,c3 and c4.

I tried concat, merge columns but i cannot get it work.

Please let me know if you have a solutions?


回答1:


You can use unstack for creating Series from DataFrame and then concat to original:

print (pd.concat([ref, ref.unstack().reset_index(drop=True).rename('c5')], axis=1))
     c1   c2   c3   c4   c5
0   1.0  3.0  0.3  7.0  1.0
1   0.0  4.0  0.5  4.5  0.0
2   2.0  5.0  0.6  3.0  2.0
3   NaN  NaN  NaN  NaN  3.0
4   NaN  NaN  NaN  NaN  4.0
5   NaN  NaN  NaN  NaN  5.0
6   NaN  NaN  NaN  NaN  0.3
7   NaN  NaN  NaN  NaN  0.5
8   NaN  NaN  NaN  NaN  0.6
9   NaN  NaN  NaN  NaN  7.0
10  NaN  NaN  NaN  NaN  4.5
11  NaN  NaN  NaN  NaN  3.0

Alternative solution for creating Series is convert df to numpy array by values and then reshape by ravel:

    print (pd.concat([ref, pd.Series(ref.values.ravel('F'), name='c5')], axis=1))
         c1   c2   c3   c4   c5
    0   1.0  3.0  0.3  7.0  1.0
    1   0.0  4.0  0.5  4.5  0.0
    2   2.0  5.0  0.6  3.0  2.0
    3   NaN  NaN  NaN  NaN  3.0
    4   NaN  NaN  NaN  NaN  4.0
    5   NaN  NaN  NaN  NaN  5.0
    6   NaN  NaN  NaN  NaN  0.3
    7   NaN  NaN  NaN  NaN  0.5
    8   NaN  NaN  NaN  NaN  0.6
    9   NaN  NaN  NaN  NaN  7.0
    10  NaN  NaN  NaN  NaN  4.5
    11  NaN  NaN  NaN  NaN  3.0



回答2:


using join + ravel('F')

ref.join(pd.Series(ref.values.ravel('F')).to_frame('c5'), how='right')

using join + T.ravel()

ref.join(pd.Series(ref.values.T.ravel()).to_frame('c5'), how='right')

pd.concat + T.stack() + rename

pd.concat([ref, ref.T.stack().reset_index(drop=True).rename('c5')], axis=1)

way too many transposes + append

ref.T.append(ref.T.stack().reset_index(drop=True).rename('c5')).T

combine_first + ravel('F') <--- my favorite

ref.combine_first(pd.Series(ref.values.ravel('F')).to_frame('c5'))

All yield

     c1   c2   c3   c4   c5
0   1.0  3.0  0.3  7.0  1.0
1   0.0  4.0  0.5  4.5  0.0
2   2.0  5.0  0.6  3.0  2.0
3   NaN  NaN  NaN  NaN  3.0
4   NaN  NaN  NaN  NaN  4.0
5   NaN  NaN  NaN  NaN  5.0
6   NaN  NaN  NaN  NaN  0.3
7   NaN  NaN  NaN  NaN  0.5
8   NaN  NaN  NaN  NaN  0.6
9   NaN  NaN  NaN  NaN  7.0
10  NaN  NaN  NaN  NaN  4.5
11  NaN  NaN  NaN  NaN  3.0



回答3:


use the list(zip()) as follows:

d=list(zip(df1.c1,df1.c2,df1.c3,df1.c4))
df2['c5']=pd.Series(d)



回答4:


try this one, works as you expected import numpy as np import pandas as pd

df = pd.DataFrame([[1,2,3,4],[2,3,4,5],[3,4,5,6]], columns=['c1','c2','c3','c4'])
print(df)
r = len(df['c1'])
c = len(list(df))

ndata = list(df.c1) + list(df.c2) + list(df.c3) + list(df.c4)
r = len(ndata) - r
t = r*c
dfnan = pd.DataFrame(np.reshape([np.nan]*t, (r,c)), columns=list(df))
df = df.append(dfnan)
df['c5'] = ndata
print(df)

output is below




回答5:


This could be a fast option and maybe you can use it inside a loop.

import numpy as np

import pandas as pd

df = pd.DataFrame([[1,2,3,4],[2,3,4,5],[3,4,5,6]], columns=['c1','c2','c3','c4'])

df['c5'] = df.iloc[:,0].astype(str) + df.iloc[:,1].astype(str) + df.iloc[:,2].astype(str) + df.iloc[:,3].astype(str)

Greetings



来源:https://stackoverflow.com/questions/41627678/merging-multiple-columns-into-one-columns-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!