How to concatenate multiple column values into a single column in Panda dataframe

后端 未结 11 1640
梦谈多话
梦谈多话 2020-12-02 14:25

This question is same to this posted earlier. I want to concatenate three columns instead of concatenating two columns:

Here is the combining two columns:

         


        
相关标签:
11条回答
  • 2020-12-02 14:50
    df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
    
    df['combined'] = df['foo'].astype(str)+'_'+df['bar'].astype(str)
    

    If you concatenate with string('_') please you convert the column to string which you want and after you can concatenate the dataframe.

    0 讨论(0)
  • Just wanted to make a time comparison for both solutions (for 30K rows DF):

    In [1]: df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
    
    In [2]: big = pd.concat([df] * 10**4, ignore_index=True)
    
    In [3]: big.shape
    Out[3]: (30000, 3)
    
    In [4]: %timeit big.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
    1 loop, best of 3: 881 ms per loop
    
    In [5]: %timeit big['bar'].astype(str)+'_'+big['foo']+'_'+big['new']
    10 loops, best of 3: 44.2 ms per loop
    

    a few more options:

    In [6]: %timeit big.ix[:, :-1].astype(str).add('_').sum(axis=1).str.cat(big.new)
    10 loops, best of 3: 72.2 ms per loop
    
    In [11]: %timeit big.astype(str).add('_').sum(axis=1).str[:-1]
    10 loops, best of 3: 82.3 ms per loop
    
    0 讨论(0)
  • 2020-12-02 14:51

    I think you are missing one %s

    df['combined']=df.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
    
    0 讨论(0)
  • 2020-12-02 14:55

    Possibly the fastest solution is to operate in plain Python:

    Series(
        map(
            '_'.join,
            df.values.tolist()
            # when non-string columns are present:
            # df.values.astype(str).tolist()
        ),
        index=df.index
    )
    

    Comparison against @MaxU answer (using the big data frame which has both numeric and string columns):

    %timeit big['bar'].astype(str) + '_' + big['foo'] + '_' + big['new']
    # 29.4 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    
    %timeit Series(map('_'.join, big.values.astype(str).tolist()), index=big.index)
    # 27.4 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    Comparison against @derchambers answer (using their df data frame where all columns are strings):

    from functools import reduce
    
    def reduce_join(df, columns):
        slist = [df[x] for x in columns]
        return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])
    
    def list_map(df, columns):
        return Series(
            map(
                '_'.join,
                df[columns].values.tolist()
            ),
            index=df.index
        )
    
    %timeit df1 = reduce_join(df, list('1234'))
    # 602 ms ± 39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    %timeit df2 = list_map(df, list('1234'))
    # 351 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    0 讨论(0)
  • 2020-12-02 15:03

    If you have even more columns you want to combine, using the Series method str.cat might be handy:

    df["combined"] = df["foo"].str.cat(df[["bar", "new"]].astype(str), sep="_")
    

    Basically, you select the first column (if it is not already of type str, you need to append .astype(str)), to which you append the other columns (separated by an optional separator character).

    0 讨论(0)
  • 2020-12-02 15:03
    df['New_column_name'] = df['Column1'].map(str) + 'X' + df['Steps']
    

    X= x is any delimiter (eg: space) by which you want to separate two merged column.

    0 讨论(0)
提交回复
热议问题