Group by and fill missing datetime values with duplicates

问题

This question comes from this one: Group by and fill missing datetime values

What I'm just trying is to group a Pandas Dataframe by contract, check if there are duplicated datetime values and fill this ones. If there are duplicates, there will be a total of 25 hours, and if not, 24.

My input is this:

contract         datetime             value1          value2
   x       2019-01-01 00:00:00          50              60
   x       2019-01-01 02:00:00          30              60
   x       2019-01-01 02:00:00          70              80
   x       2019-01-01 03:00:00          70              80
   y       2019-01-01 00:00:00          30              100

With this Dataframe my output should be something like this:

contract         date              value1                     value2
   x           2019-01-01    [50,NaN,30,70,70,NaN,Nan...]    [60, NaN, Nan...]
   y           2019-01-01    [30, NaN, Nan...]               [100, NaN, NaN...]

Thank you very much.

回答1:

Idea is first create lists for possible use previous solution:

df['datetime'] = pd.to_datetime(df['datetime'])

df = df.groupby(['contract','datetime']).agg(list)

f= lambda x: x.reindex(pd.date_range(x.index.min().floor('d'),
                                     x.index.max().floor('d')+pd.Timedelta(23, 'H'),
                                     freq='H', name='datetime'))
df1 = (df.reset_index('contract')
         .groupby('contract')['value1','value2']
         .apply(f)
         .reset_index())

Last grouping by contract and dates and flatten lists with chain.from_iterable:

from  itertools import chain

df2 = (df1.groupby(['contract', df1['datetime'].dt.date])
         .agg(lambda x: list(chain.from_iterable(y if y==y else [y] for y in x)))
         .reset_index()
         )
print (df2)
  contract    datetime                                             value1  \
0        x  2019-01-01  [50, nan, 30, 70, 70, nan, nan, nan, nan, nan,...   
1        y  2019-01-01  [30, nan, nan, nan, nan, nan, nan, nan, nan, n...   

                                              value2  
0  [60, nan, 60, 80, 80, nan, nan, nan, nan, nan,...  
1  [100, nan, nan, nan, nan, nan, nan, nan, nan, ...

Test lenghts:

print (df2[['value1','value2']].applymap(len))
   value1  value2
0      25      25
1      24      24

回答2:

If I am understanding correctly, I think this might work:

df['datetime'] = pd.to_datetime(df['datetime'], format='%Y-%m-%d')

then just groupby from there.

(full disclosure, I didn't double-check but I think that's the appropriate format to get YYYY-MM-DD) also to avoid confusion, it might be worth renaming ['datetime'] to something else.

来源：https://stackoverflow.com/questions/59373305/group-by-and-fill-missing-datetime-values-with-duplicates

标签

python

pandas

dataframe