How to concatenate multiple column values into a single column in Panda dataframe

后端 未结 11 1641
梦谈多话
梦谈多话 2020-12-02 14:25

This question is same to this posted earlier. I want to concatenate three columns instead of concatenating two columns:

Here is the combining two columns:

         


        
相关标签:
11条回答
  • 2020-12-02 15:04

    you can simply do:

    In[17]:df['combined']=df['bar'].astype(str)+'_'+df['foo']+'_'+df['new']
    
    In[17]:df
    Out[18]: 
       bar foo     new    combined
    0    1   a   apple   1_a_apple
    1    2   b  banana  2_b_banana
    2    3   c    pear    3_c_pear
    
    0 讨论(0)
  • 2020-12-02 15:06

    The answer given by @allen is reasonably generic but can lack in performance for larger dataframes:

    Reduce does a lot better:

    from functools import reduce
    
    import pandas as pd
    
    # make data
    df = pd.DataFrame(index=range(1_000_000))
    df['1'] = 'CO'
    df['2'] = 'BOB'
    df['3'] = '01'
    df['4'] = 'BILL'
    
    
    def reduce_join(df, columns):
        assert len(columns) > 1
        slist = [df[x].astype(str) for x in columns]
        return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])
    
    
    def apply_join(df, columns):
        assert len(columns) > 1
        return df[columns].apply(lambda row:'_'.join(row.values.astype(str)), axis=1)
    
    # ensure outputs are equal
    df1 = reduce_join(df, list('1234'))
    df2 = apply_join(df, list('1234'))
    assert df1.equals(df2)
    
    # profile
    %timeit df1 = reduce_join(df, list('1234'))  # 733 ms
    %timeit df2 = apply_join(df, list('1234'))   # 8.84 s
    
    
    0 讨论(0)
  • 2020-12-02 15:11

    Another solution using DataFrame.apply(), with slightly less typing and more scalable when you want to join more columns:

    cols = ['foo', 'bar', 'new']
    df['combined'] = df[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)
    
    0 讨论(0)
  • 2020-12-02 15:14

    @derchambers I found one more solution:

    import pandas as pd
    
    # make data
    df = pd.DataFrame(index=range(1_000_000))
    df['1'] = 'CO'
    df['2'] = 'BOB'
    df['3'] = '01'
    df['4'] = 'BILL'
    
    def eval_join(df, columns):
    
        sum_elements = [f"df['{col}']" for col in list('1234')]
        to_eval = "+ '_' + ".join(sum_elements)
    
        return eval(to_eval)
    
    
    #profile
    %timeit df3 = eval_join(df, list('1234')) # 504 ms
    
    0 讨论(0)
  • 2020-12-02 15:15

    If you have a list of columns you want to concatenate and maybe you'd like to use some separator, here's what you can do

    def concat_columns(df, cols_to_concat, new_col_name, sep=" "):
        df[new_col_name] = df[cols_to_concat[0]]
        for col in cols_to_concat[1:]:
            df[new_col_name] = df[new_col_name].astype(str) + sep + df[col].astype(str)
    
    

    This should be faster than apply and takes an arbitrary number of columns to concatenate.

    0 讨论(0)
提交回复
热议问题