What is the Big O Complexity of Reversing the Order of Columns in Pandas DataFrame?

后端 未结 3 934
北恋
北恋 2020-12-03 17:30

So lets say I have a DataFrame in pandas with a m rows and n columns. Let\'s also say that I wanted to reverse the order of the columns, which can be done with the following

3条回答
  •  温柔的废话
    2020-12-03 18:04

    I ran an empirical test using big_O fitting library here

    Note: All tests have been conducted on independent variable sweeping 6 orders of magnitude (i.e.

    • rows from 10 to 10^6 vs. constant column size of 3,
    • columns from 10 to 10^6 vs. constant row size of 10

    The result shows that the columns reverse operation .columns[::-1] complexity in the DataFrame is

    1. Cubical: O(n^3) where n is the number of rows
    2. Cubical: O(n^3) where n is the number of columns

    Prerequisites: You will need to install big_o() using terminal command pip install big_o

    Code

    import big_o
    import pandas as pd
    import numpy as np
    
    SWEAP_LOG10 = 6
    COLUMNS = 3
    ROWS = 10
    
    def build_df(rows, columns):
        # To isolated the creation of the DataFrame from the inversion operation.
        narray = np.zeros(rows*columns).reshape(rows, columns)
        df = pd.DataFrame(narray)
        return df
    
    def flip_columns(df):
        return df[df.columns[::-1]]
    
    def get_row_df(n, m=COLUMNS):
        return build_df(1*10**n, m)
    
    def get_column_df(n, m=ROWS):
        return build_df(m, 1*10**n)
    
    
    # infer the big_o on columns[::-1] operation vs. rows
    best, others = big_o.big_o(flip_columns, get_row_df, min_n=1, max_n=SWEAP_LOG10,n_measures=SWEAP_LOG10, n_repeats=10)
    
    # print results
    print('Measuring .columns[::-1] complexity against rapid increase in # rows')
    print('-'*80 + '\nBig O() fits: {}\n'.format(best) + '-'*80)
    
    for class_, residual in others.items():
        print('{:<60s}  (res: {:.2G})'.format(str(class_), residual))
    
    print('-'*80)
    
    # infer the big_o on columns[::-1] operation vs. columns
    best, others = big_o.big_o(flip_columns, get_column_df, min_n=1, max_n=SWEAP_LOG10,n_measures=SWEAP_LOG10, n_repeats=10)
    
    # print results
    print()
    print('Measuring .columns[::-1] complexity against rapid increase in # columns')
    print('-'*80 + '\nBig O() fits: {}\n'.format(best) + '-'*80)
    
    for class_, residual in others.items():
        print('{:<60s}  (res: {:.2G})'.format(str(class_), residual))
        
    print('-'*80)
    

    Results

    Measuring .columns[::-1] complexity against rapid increase in # rows
    --------------------------------------------------------------------------------
    Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
    --------------------------------------------------------------------------------
    Constant: time = 0.032                                        (res: 0.021)
    Linear: time = -0.051 + 0.024*n                               (res: 0.011)
    Quadratic: time = -0.026 + 0.0038*n^2                         (res: 0.0077)
    Cubic: time = -0.017 + 0.00067*n^3                            (res: 0.0052)
    Polynomial: time = -6.3 * x^1.5                               (res: 6)
    Logarithmic: time = -0.026 + 0.053*log(n)                     (res: 0.015)
    Linearithmic: time = -0.024 + 0.012*n*log(n)                  (res: 0.0094)
    Exponential: time = -7 * 0.66^n                               (res: 3.6)
    --------------------------------------------------------------------------------
    
    
    Measuring .columns[::-1] complexity against rapid increase in # columns
    --------------------------------------------------------------------------------
    Big O() fits: Cubic: time = -0.28 + 0.009*n^3
    --------------------------------------------------------------------------------
    Constant: time = 0.38                                         (res: 3.9)
    Linear: time = -0.73 + 0.32*n                                 (res: 2.1)
    Quadratic: time = -0.4 + 0.052*n^2                            (res: 1.5)
    Cubic: time = -0.28 + 0.009*n^3                               (res: 1.1)
    Polynomial: time = -6 * x^2.2                                 (res: 16)
    Logarithmic: time = -0.39 + 0.71*log(n)                       (res: 2.8)
    Linearithmic: time = -0.38 + 0.16*n*log(n)                    (res: 1.8)
    Exponential: time = -7 * 1^n                                  (res: 9.7)
    --------------------------------------------------------------------------------
    

提交回复
热议问题