问题
This question comes from this one: Group by and fill missing datetime values
What I'm just trying is to group a Pandas Dataframe by contract, check if there are duplicated datetime values and fill this ones. If there are duplicates, there will be a total of 25 hours, and if not, 24.
My input is this:
contract datetime value1 value2
x 2019-01-01 00:00:00 50 60
x 2019-01-01 02:00:00 30 60
x 2019-01-01 02:00:00 70 80
x 2019-01-01 03:00:00 70 80
y 2019-01-01 00:00:00 30 100
With this Dataframe my output should be something like this:
contract date value1 value2
x 2019-01-01 [50,NaN,30,70,70,NaN,Nan...] [60, NaN, Nan...]
y 2019-01-01 [30, NaN, Nan...] [100, NaN, NaN...]
Thank you very much.
回答1:
Idea is first create lists for possible use previous solution:
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.groupby(['contract','datetime']).agg(list)
f= lambda x: x.reindex(pd.date_range(x.index.min().floor('d'),
x.index.max().floor('d')+pd.Timedelta(23, 'H'),
freq='H', name='datetime'))
df1 = (df.reset_index('contract')
.groupby('contract')['value1','value2']
.apply(f)
.reset_index())
Last grouping by contract
and dates and flatten lists with chain.from_iterable
:
from itertools import chain
df2 = (df1.groupby(['contract', df1['datetime'].dt.date])
.agg(lambda x: list(chain.from_iterable(y if y==y else [y] for y in x)))
.reset_index()
)
print (df2)
contract datetime value1 \
0 x 2019-01-01 [50, nan, 30, 70, 70, nan, nan, nan, nan, nan,...
1 y 2019-01-01 [30, nan, nan, nan, nan, nan, nan, nan, nan, n...
value2
0 [60, nan, 60, 80, 80, nan, nan, nan, nan, nan,...
1 [100, nan, nan, nan, nan, nan, nan, nan, nan, ...
Test lenghts:
print (df2[['value1','value2']].applymap(len))
value1 value2
0 25 25
1 24 24
回答2:
If I am understanding correctly, I think this might work:
df['datetime'] = pd.to_datetime(df['datetime'], format='%Y-%m-%d')
then just groupby from there.
(full disclosure, I didn't double-check but I think that's the appropriate format to get YYYY-MM-DD) also to avoid confusion, it might be worth renaming ['datetime']
to something else.
来源:https://stackoverflow.com/questions/59373305/group-by-and-fill-missing-datetime-values-with-duplicates