A lot of times, I have a big dataframe df to hold the basic data, and need to create many more columns to hold the derivative data calculated by basic data columns.
You could subclass DataFrame, and add the column as a property. For example,
import pandas as pd
class LazyFrame(pd.DataFrame):
@property
def derivative_col1(self):
self['derivative_col1'] = result = self['basic_col1'] + self['basic_col2']
return result
x = LazyFrame({'basic_col1':[1,2,3],
'basic_col2':[4,5,6]})
print(x)
# basic_col1 basic_col2
# 0 1 4
# 1 2 5
# 2 3 6
Accessing the property (via x.derivative_col1, below) calls the derivative_col1 function defined in LazyFrame. This function computes the result and adds the derived column to the LazyFrame instance:
print(x.derivative_col1)
# 0 5
# 1 7
# 2 9
print(x)
# basic_col1 basic_col2 derivative_col1
# 0 1 4 5
# 1 2 5 7
# 2 3 6 9
Note that if you modify a basic column:
x['basic_col1'] *= 10
the derived column is not automatically updated:
print(x['derivative_col1'])
# 0 5
# 1 7
# 2 9
But if you access the property, the values are recomputed:
print(x.derivative_col1)
# 0 14
# 1 25
# 2 36
print(x)
# basic_col1 basic_col2 derivative_col1
# 0 10 4 14
# 1 20 5 25
# 2 30 6 36