How to create a new row on the fly by copying previous row

问题

I have a dataframe like as given below

edited dataframe

df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-05 12:59:00','2173-05-04 13:14:00','2173-05-05 13:37:00','2173-07-06 13:39:00','2173-07-08 11:30:00','2173-04-08 16:00:00','2173-04-09 22:00:00','2173-04-11 04:00:00','2173- 04-13 04:30:00','2173-04-14 08:00:00'],
 'val' :[5,5,5,5,1,6,5,5,8,3,4,6]})
df['time_1'] = pd.to_datetime(df_yes['time_1'])
df['day'] = df['time_1'].dt.day

What I would like to do is create a new record

As shown in the below screenshot, you can see that for subject_id = 1, his record for 4th day is missing. So what I am trying to do is `copy the immediate preceding row

I tried below but didn't help

df.groupby('subject_id)['day'].eq(df['day'].shift(-1)).add(1)

The new record should have the same content as the previous row but just the date value should be modified (d+1) like as shown below

I expect my output to be like as shown below for each subject_id. You can see how new record for day 4 is added. please note that time component of a new row doesn't really matter. it can be anything (00:00:00).

I only wish to add missing dates between a range in a month. For example subject = 1, in the 4th month has records from 3rd to 5th. but 4th is missing. So we add record for 4th day alone. We don't need 6th,7th etc

edited output

回答1:

There are duplicated dates after remove times, so you can create helper DataFrame with all dates per subject_id:

df1 = (df.set_index('date')
         .groupby('subject_id')
         .resample('d')
         .last()
         .index
         .to_frame(index=False))
print (df1)
    subject_id       date
0            1 2173-04-03
1            1 2173-04-04
2            1 2173-04-05
3            1 2173-04-06
4            2 2173-04-08
5            2 2173-04-09
6            2 2173-04-10
7            2 2173-04-11
8            2 2173-04-12
9            2 2173-04-13
10           2 2173-04-14

Then use DataFrame.merge with left join and forward filling missing values:

df2 = df1.merge(df, how='left').groupby('subject_id', as_index=False).ffill()

Last is necessary add days to new added datetimes, one possible solution is add timedeltas created by difference between new time_1 values with dates:

dates = df2['time_1'].dt.normalize() 
df2['time_1'] += np.where(dates == df2['date'], 0, df2['date'] - dates)
df2['day'] = df2['time_1'].dt.day
df2['val'] = df2['val'].astype(int)
print (df2)

         date              time_1  val  day
0  2173-04-03 2173-04-03 12:35:00    5    3
1  2173-04-03 2173-04-03 12:50:00    5    3
2  2173-04-03 2173-04-03 12:59:00    5    3
3  2173-04-04 2173-04-04 13:14:00    5    4
4  2173-04-04 2173-04-04 13:37:00    1    4
5  2173-04-05 2173-04-05 13:37:00    1    5
6  2173-04-06 2173-04-06 13:39:00    6    6
7  2173-04-06 2173-04-06 11:30:00    5    6
8  2173-04-08 2173-04-08 16:00:00    5    8
9  2173-04-09 2173-04-09 22:00:00    8    9
10 2173-04-10 2173-04-10 22:00:00    8   10
11 2173-04-11 2173-04-11 04:00:00    3   11
12 2173-04-12 2173-04-12 04:00:00    3   12
13 2173-04-13 2173-04-13 04:30:00    4   13
14 2173-04-14 2173-04-14 08:00:00    6   14

来源：https://stackoverflow.com/questions/57784410/how-to-create-a-new-row-on-the-fly-by-copying-previous-row

标签

python

python-3.x

pandas

dataframe

pandas-groupby