Compute dataframe columns from a string formula in variables?

纵饮孤独 提交于 2021-01-28 18:25:36

问题


I use an excel file in which I determine the names of sensor, and a formula allowing me to create a new "synthetic" sensor based on real sensors. I would like to write the formula as string like for example "y1 + y2 + y3" and not "df ['y1'] + df ['y2'] + df ['y3]" but I don't see which method to use?

Excel file example:

My script must therefore create a new sensor for each line of this excel file. This new sensor will then be uploaded to my database. The number of sensors to calculate the new value is variable!

Here is my code sample :

# From excel file
sensor_cell = '008253_sercit_sercit_batg_b_flr0_g_tctl_z1_tr_prval|officielles_darksky_bierset_temp|officielles_darksky_uccle_temp|005317_esa_001_hur_piece_030110'
formula_cell = "df['y1'] + df['y2'] + df['y3'] + df['y4']"
# formula_cell = 'y1+y2+y3+y4' --> what I would like to be able to write in my  excel file cell

list = sensor_cell.split('|')

df = []
for sensor in list:
    position = list.index(sensor) + 1
    df_y = search_ES(sensor) # Function that return a df with timestamp and value from my database
    df_y = df_y.rename(columns={'value': "y"+ str(position)})
    df.append(df_y)

df = pd.concat(df, axis=1, sort=True)
# I would like to have :
# y1 = df['y1']
# y2 = df['y2']
# y3 = df['y3']
# y4 = df['y4']
df = df.dropna()
print(df)
df['value'] = eval(formula_cell) # Formula from excel file
print(df)

df before applying the formula :

                           y1    y2    y3  y4
2019-12-11 00:00:00  20.500000  5.62  6.03  29
2019-12-11 01:00:00  21.180000  5.54  6.15  30
2019-12-11 02:00:00  21.020000  5.28  6.29  30
2019-12-11 03:00:00  20.760000  4.99  6.36  29
2019-12-11 04:00:00  20.680000  4.80  6.26  30
2019-12-11 05:00:00  20.760000  4.63  6.07  30
2019-12-11 06:00:00  20.900000  4.49  5.91  30
2019-12-11 07:00:00  20.920000  4.20  6.05  30
2019-12-11 08:00:00  21.320000  4.15  5.95  30
2019-12-11 09:00:00  21.840000  4.42  5.81  30
2019-12-11 10:00:00  22.460000  4.24  5.81  30
2019-12-11 11:00:00  22.240000  4.11  5.89  31
2019-12-11 12:00:00  22.420000  4.43  6.15  32
2019-12-11 13:00:00  21.740000  4.37  6.14  32
2019-12-11 14:00:00  22.500000  4.48  6.24  31
2019-12-11 15:00:00  22.980000  4.87  6.46  32
2019-12-11 16:00:00  22.420000  4.56  6.21  32
2019-12-11 17:00:00  22.320000  4.40  5.92  32
2019-12-11 18:00:00  21.939999  4.52  6.19  32
2019-12-11 19:00:00  20.680000  4.30  5.35  32
2019-12-11 20:00:00  20.900000  4.28  4.94  32
2019-12-11 21:00:00  20.859999  4.55  5.21  32
2019-12-11 22:00:00  20.520000  4.28  4.73  32
2019-12-11 23:00:00  20.320000  4.24  4.90  32

df after applying the formula :

                          y1    y2    y3  y4      value
2019-12-11 00:00:00  20.500000  5.62  6.03  29  61.150000
2019-12-11 01:00:00  21.180000  5.54  6.15  30  62.870000
2019-12-11 02:00:00  21.020000  5.28  6.29  30  62.590000
2019-12-11 03:00:00  20.760000  4.99  6.36  29  61.110000
2019-12-11 04:00:00  20.680000  4.80  6.26  30  61.740000
2019-12-11 05:00:00  20.760000  4.63  6.07  30  61.460000
2019-12-11 06:00:00  20.900000  4.49  5.91  30  61.300000
2019-12-11 07:00:00  20.920000  4.20  6.05  30  61.170000
2019-12-11 08:00:00  21.320000  4.15  5.95  30  61.420000
2019-12-11 09:00:00  21.840000  4.42  5.81  30  62.070000
2019-12-11 10:00:00  22.460000  4.24  5.81  30  62.510000
2019-12-11 11:00:00  22.240000  4.11  5.89  31  63.240000
2019-12-11 12:00:00  22.420000  4.43  6.15  32  65.000000
2019-12-11 13:00:00  21.740000  4.37  6.14  32  64.250000
2019-12-11 14:00:00  22.500000  4.48  6.24  31  64.220000
2019-12-11 15:00:00  22.980000  4.87  6.46  32  66.310000
2019-12-11 16:00:00  22.420000  4.56  6.21  32  65.190000
2019-12-11 17:00:00  22.320000  4.40  5.92  32  64.640000
2019-12-11 18:00:00  21.939999  4.52  6.19  32  64.649999
2019-12-11 19:00:00  20.680000  4.30  5.35  32  62.330000
2019-12-11 20:00:00  20.900000  4.28  4.94  32  62.120000
2019-12-11 21:00:00  20.859999  4.55  5.21  32  62.619999
2019-12-11 22:00:00  20.520000  4.28  4.73  32  61.530000
2019-12-11 23:00:00  20.320000  4.24  4.90  32  61.460000

EDIT - SOLUTION:

The problem was solved by using df.eval:

formula_cell = 'fictive = y1+y2+y3+y4' 
df.eval(formula_cell, inplace=True)

回答1:


I think pandas.query is what you want. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html

Example:

formula_cell = "y1 + y2 + y3 + y4"
df['value'] = df.query(formula_cell)


来源:https://stackoverflow.com/questions/59305007/compute-dataframe-columns-from-a-string-formula-in-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!