Concatenating all columns in pandas dataframe

与世无争的帅哥 提交于 2021-01-28 23:27:07


I am trying to concatenate all columns of a pandas dataframe so that I end up with 1 column that contains all the values from the dataframe. The following code does this:

df2 = pd.concat([df[0], df[1], df[2], df[3], df[4], df[5], df[6], df[7]])

But I would like to be able to do this with dataframes that have different numbers of columns. When I tried:

dfpr2 = pd.concat([df.columns)

I get the following error: "cannot concatenate object of type <class 'pandas.core.indexes.range.RangeIndex>; only Series and DataFrame objs are valid

is there a way to get around this? I tried setting ignore_index = True, but that did not seem to help either. Thanks!!


IIUC df.astype(str).sum(axis=1)

df = pd.DataFrame({'A' : ['A','B','C'],
             'B' : [0,1,2],
             'C' : ['2019-01-10','2020-01-10','2021-01-10']})

df['hash'] = df.astype(str).sum(axis=1)


   A  B           C          hash
0  A  0  2019-01-10  A02019-01-10
1  B  1  2020-01-10  B12020-01-10
2  C  2  2021-01-10  C22021-01-10

If you need a custom delimiter then use .agg


0    A|0|2019-01-10
1    B|1|2020-01-10
2    C|2|2021-01-10


This is a simple way of concatenating column values

df1 = df['1st Column Name'] + df['2nd Column Name'] + ...


Timing for different methods : 

%timeit df.iloc[:,0][:,1:].astype(str),',')
880 µs ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.astype(str).agg('|'.join,axis=1)
1.45 ms ± 39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.astype(str).sum(axis=1)
562 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit [','.join(ent) for ent in df.astype(str).to_numpy()]
350 µs ± 6.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I think @cs95 has a stackoverflow post that talked about strings. for strings, they are much faster when the computation is done within Python.

