Combine two columns of text in pandas dataframe

后端 未结 18 1428
-上瘾入骨i
-上瘾入骨i 2020-11-22 01:32

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter. I\'d like to create a variable called p

18条回答
  •  南旧
    南旧 (楼主)
    2020-11-22 01:43

    more efficient is

    def concat_df_str1(df):
        """ run time: 1.3416s """
        return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)
    

    and here is a time test:

    import numpy as np
    import pandas as pd
    
    from time import time
    
    
    def concat_df_str1(df):
        """ run time: 1.3416s """
        return pd.Series([''.join(row.astype(str)) for row in df.values], index=df.index)
    
    
    def concat_df_str2(df):
        """ run time: 5.2758s """
        return df.astype(str).sum(axis=1)
    
    
    def concat_df_str3(df):
        """ run time: 5.0076s """
        df = df.astype(str)
        return df[0] + df[1] + df[2] + df[3] + df[4] + \
               df[5] + df[6] + df[7] + df[8] + df[9]
    
    
    def concat_df_str4(df):
        """ run time: 7.8624s """
        return df.astype(str).apply(lambda x: ''.join(x), axis=1)
    
    
    def main():
        df = pd.DataFrame(np.zeros(1000000).reshape(100000, 10))
        df = df.astype(int)
    
        time1 = time()
        df_en = concat_df_str4(df)
        print('run time: %.4fs' % (time() - time1))
        print(df_en.head(10))
    
    
    if __name__ == '__main__':
        main()
    

    final, when sum(concat_df_str2) is used, the result is not simply concat, it will trans to integer.

提交回复
热议问题