问题
I have a dataframe, produced by this piece of code:
import pandas as pd
import datetime as dt
datarange=pd.date_range('01-05-2018 00:00:00', periods=50, freq="4H")
range_series_1=pd.Series(np.random.randint(-5,3,size=50).astype(float), index=datarange)
range_series_2=pd.Series(np.random.randint(5,9,size=50).astype(float), index=datarange)
frame=pd.DataFrame({'value1':range_series_1, 'value2':range_series_2})
frame.index.name='datetime'
I need two extra columns (e.g. 'column 1' and 'column 2'), which host the following calculations:
column 1 = value1[every day @ 4:00] - value1[same day @ 12:00] --> the value should be placed at 00:00 of the day
column 2 = value1[every day @ 4:00] - value2[same day @ 12:00] --> the value should be placed at 00:00 of the day
The rest of the rows in column 1 and column 2 should be either NaN or 0.
Any pandaic idea how to do this, please?
ADDITIONAL INFORMATION to clarify: I may have a dataframe which won't have datetimes as presented here in such a strict manner, so pd.shift() may not be a global solution.
An example of the desired output is the following (with comments to be clearer):
value1 value2 column1 column2
datetime
2018-01-05 00:00:00 -3.0 8.0 2.0 -8.0 #2.0=-3.0-(-5.0) and -8.0=-3.0-(5.0)
2018-01-05 04:00:00 -3.0 8.0 NaN NaN
2018-01-05 08:00:00 1.0 8.0 NaN NaN
2018-01-05 12:00:00 -5.0 5.0 NaN NaN
2018-01-05 16:00:00 0.0 5.0 NaN NaN
2018-01-05 20:00:00 -4.0 8.0 NaN NaN
2018-01-06 00:00:00 -5.0 8.0 -1.0 -11.0 #-1.0=-5.0-(-4.0) and -11.0=-5.0-(6.0)
2018-01-06 04:00:00 -5.0 5.0 NaN NaN
2018-01-06 08:00:00 -2.0 5.0 NaN NaN
2018-01-06 12:00:00 -4.0 6.0 NaN NaN
2018-01-06 16:00:00 -1.0 6.0 NaN NaN
2018-01-06 20:00:00 1.0 7.0 NaN NaN
2018-01-07 00:00:00 -1.0 5.0 4.0 -4.0 #4.0=2.0-(-2.0) and -4.0=2.0-(6.0)
2018-01-07 04:00:00 2.0 8.0 NaN NaN
2018-01-07 08:00:00 -1.0 8.0 NaN NaN
2018-01-07 12:00:00 -2.0 6.0 NaN NaN
2018-01-07 16:00:00 2.0 6.0 NaN NaN
2018-01-07 20:00:00 -4.0 6.0 NaN NaN
I am trying to figure out if this answer can help me, but I am not very familiar with iterating over rows yet ( Update a dataframe in pandas while iterating row by row).
回答1:
Use:
from datetime import time
np.random.seed(2019)
datarange=pd.date_range('01-05-2018 00:00:00', periods=50, freq="4H")
range_series_1=pd.Series(np.random.randint(-5,3,size=50).astype(float), index=datarange)
range_series_2=pd.Series(np.random.randint(5,9,size=50).astype(float), index=datarange)
frame=pd.DataFrame({'value1':range_series_1, 'value2':range_series_2})
frame.index.name='datetime'
#print (frame)
Idea is compare times for index and use DatetimeIndex.floor for remove times (created default 00:00:00) times for Series and DataFrame:
s = frame.loc[frame.index.time == time(4, 0), 'value1']
s.index = s.index.floor('d')
print (s)
datetime
2018-01-05 -3.0
2018-01-06 -5.0
2018-01-07 -5.0
2018-01-08 -5.0
2018-01-09 -1.0
2018-01-10 -4.0
2018-01-11 -2.0
2018-01-12 0.0
2018-01-13 1.0
Name: value1, dtype: float64
df1 = frame.loc[frame.index.time == time(12, 0), ['value1', 'value2']]
df1.index = df1.index.floor('d')
print (df1)
value1 value2
datetime
2018-01-05 0.0 6.0
2018-01-06 2.0 8.0
2018-01-07 -5.0 7.0
2018-01-08 2.0 7.0
2018-01-09 -1.0 5.0
2018-01-10 1.0 7.0
2018-01-11 2.0 7.0
2018-01-12 -2.0 6.0
Then subtract from right side by DataFrame.rsub, add some prefix to new columns and join to original:
frame = frame.join(df1.rsub(s, axis=0).add_prefix('new_'))
print (frame.head(15))
value1 value2 new_value1 new_value2
datetime
2018-01-05 00:00:00 -5.0 6.0 -3.0 -9.0
2018-01-05 04:00:00 -3.0 5.0 NaN NaN
2018-01-05 08:00:00 2.0 7.0 NaN NaN
2018-01-05 12:00:00 0.0 6.0 NaN NaN
2018-01-05 16:00:00 -5.0 7.0 NaN NaN
2018-01-05 20:00:00 1.0 6.0 NaN NaN
2018-01-06 00:00:00 1.0 5.0 -7.0 -13.0
2018-01-06 04:00:00 -5.0 8.0 NaN NaN
2018-01-06 08:00:00 0.0 6.0 NaN NaN
2018-01-06 12:00:00 2.0 8.0 NaN NaN
2018-01-06 16:00:00 -1.0 8.0 NaN NaN
2018-01-06 20:00:00 -3.0 8.0 NaN NaN
2018-01-07 00:00:00 -5.0 5.0 0.0 -12.0
2018-01-07 04:00:00 -5.0 8.0 NaN NaN
2018-01-07 08:00:00 2.0 5.0 NaN NaN
回答2:
IIUC, you want to shift by time:
shifted = frame.shift(-8, freq='H')
frame['col1'] = frame['value1'] - shifted['value1']
frame['col2'] = frame['value1'] - shifted['value2']
Output (head):
value1 value2 col1 col2
datetime
2018-01-05 00:00:00 -3.0 8.0 -2.0 -8.0
2018-01-05 04:00:00 0.0 7.0 -2.0 -5.0
2018-01-05 08:00:00 -1.0 5.0 -1.0 -8.0
2018-01-05 12:00:00 2.0 5.0 3.0 -5.0
2018-01-05 16:00:00 0.0 7.0 4.0 -6.0
来源:https://stackoverflow.com/questions/58543914/perform-calculation-between-rows-with-datetime-index-and-place-them-into-new-col