Rename specific column(s) in pandas

前端 未结 5 1367
不知归路
不知归路 2020-11-28 01:02

I\'ve got a dataframe called data. How would I rename the only one column header? For example gdp to log(gdp)?

data =
         


        
5条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-28 01:41

    There are at least five different ways to rename specific columns in pandas, and I have listed them below along with links to the original answers. I also timed these methods and found them to perform about the same (though YMMV depending on your data set and scenario). The test case below is to rename columns A M N Z to A2 M2 N2 Z2 in a dataframe with columns A to Z containing a million rows.

    # Import required modules
    import numpy as np
    import pandas as pd
    import timeit
    
    # Create sample data
    df = pd.DataFrame(np.random.randint(0,9999,size=(1000000, 26)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
    
    # Standard way - https://stackoverflow.com/a/19758398/452587
    def method_1():
        df_renamed = df.rename(columns={'A': 'A2', 'M': 'M2', 'N': 'N2', 'Z': 'Z2'})
    
    # Lambda function - https://stackoverflow.com/a/16770353/452587
    def method_2():
        df_renamed = df.rename(columns=lambda x: x + '2' if x in ['A', 'M', 'N', 'Z'] else x)
    
    # Mapping function - https://stackoverflow.com/a/19758398/452587
    def rename_some(x):
        if x=='A' or x=='M' or x=='N' or x=='Z':
            return x + '2'
        return x
    def method_3():
        df_renamed = df.rename(columns=rename_some)
    
    # Dictionary comprehension - https://stackoverflow.com/a/58143182/452587
    def method_4():
        df_renamed = df.rename(columns={col: col + '2' for col in df.columns[
            np.asarray([i for i, col in enumerate(df.columns) if 'A' in col or 'M' in col or 'N' in col or 'Z' in col])
        ]})
    
    # Dictionary comprehension - https://stackoverflow.com/a/38101084/452587
    def method_5():
        df_renamed = df.rename(columns=dict(zip(df[['A', 'M', 'N', 'Z']], ['A2', 'M2', 'N2', 'Z2'])))
    
    print('Method 1:', timeit.timeit(method_1, number=10))
    print('Method 2:', timeit.timeit(method_2, number=10))
    print('Method 3:', timeit.timeit(method_3, number=10))
    print('Method 4:', timeit.timeit(method_4, number=10))
    print('Method 5:', timeit.timeit(method_5, number=10))
    

    Output:

    Method 1: 3.650640267
    Method 2: 3.163998427
    Method 3: 2.998530871
    Method 4: 2.9918436889999995
    Method 5: 3.2436501520000007
    

    Use the method that is most intuitive to you and easiest for you to implement in your application.

提交回复
热议问题