So lets say I have a DataFrame in pandas with a m rows and n columns. Let\'s also say that I wanted to reverse the order of the columns, which can be done with the following
I ran an empirical test using big_O
fitting library here
Note: All tests have been conducted on independent variable sweeping 6 orders of magnitude (i.e.
rows
from10
to10^6
vs. constantcolumn
size of3
,columns
from10
to10^6
vs. constantrow
size of10
The result shows that the columns
reverse operation .columns[::-1]
complexity in the DataFrame
is
O(n^3)
where n is the number of rows
O(n^3)
where n is the number of columns
Prerequisites: You will need to install
big_o()
using terminal commandpip install big_o
Code
import big_o
import pandas as pd
import numpy as np
SWEAP_LOG10 = 6
COLUMNS = 3
ROWS = 10
def build_df(rows, columns):
# To isolated the creation of the DataFrame from the inversion operation.
narray = np.zeros(rows*columns).reshape(rows, columns)
df = pd.DataFrame(narray)
return df
def flip_columns(df):
return df[df.columns[::-1]]
def get_row_df(n, m=COLUMNS):
return build_df(1*10**n, m)
def get_column_df(n, m=ROWS):
return build_df(m, 1*10**n)
# infer the big_o on columns[::-1] operation vs. rows
best, others = big_o.big_o(flip_columns, get_row_df, min_n=1, max_n=SWEAP_LOG10,n_measures=SWEAP_LOG10, n_repeats=10)
# print results
print('Measuring .columns[::-1] complexity against rapid increase in # rows')
print('-'*80 + '\nBig O() fits: {}\n'.format(best) + '-'*80)
for class_, residual in others.items():
print('{:<60s} (res: {:.2G})'.format(str(class_), residual))
print('-'*80)
# infer the big_o on columns[::-1] operation vs. columns
best, others = big_o.big_o(flip_columns, get_column_df, min_n=1, max_n=SWEAP_LOG10,n_measures=SWEAP_LOG10, n_repeats=10)
# print results
print()
print('Measuring .columns[::-1] complexity against rapid increase in # columns')
print('-'*80 + '\nBig O() fits: {}\n'.format(best) + '-'*80)
for class_, residual in others.items():
print('{:<60s} (res: {:.2G})'.format(str(class_), residual))
print('-'*80)
Results
Measuring .columns[::-1] complexity against rapid increase in # rows
--------------------------------------------------------------------------------
Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
--------------------------------------------------------------------------------
Constant: time = 0.032 (res: 0.021)
Linear: time = -0.051 + 0.024*n (res: 0.011)
Quadratic: time = -0.026 + 0.0038*n^2 (res: 0.0077)
Cubic: time = -0.017 + 0.00067*n^3 (res: 0.0052)
Polynomial: time = -6.3 * x^1.5 (res: 6)
Logarithmic: time = -0.026 + 0.053*log(n) (res: 0.015)
Linearithmic: time = -0.024 + 0.012*n*log(n) (res: 0.0094)
Exponential: time = -7 * 0.66^n (res: 3.6)
--------------------------------------------------------------------------------
Measuring .columns[::-1] complexity against rapid increase in # columns
--------------------------------------------------------------------------------
Big O() fits: Cubic: time = -0.28 + 0.009*n^3
--------------------------------------------------------------------------------
Constant: time = 0.38 (res: 3.9)
Linear: time = -0.73 + 0.32*n (res: 2.1)
Quadratic: time = -0.4 + 0.052*n^2 (res: 1.5)
Cubic: time = -0.28 + 0.009*n^3 (res: 1.1)
Polynomial: time = -6 * x^2.2 (res: 16)
Logarithmic: time = -0.39 + 0.71*log(n) (res: 2.8)
Linearithmic: time = -0.38 + 0.16*n*log(n) (res: 1.8)
Exponential: time = -7 * 1^n (res: 9.7)
--------------------------------------------------------------------------------