问题
So, this is my data frame.
PatientNumber QT Answer Answerdate DiagnosisDate
1 1 transferring No 2017-03-03 2018-05-03
2 1 preparing food No 2017-03-03 2018-05-03
3 1 medications Yes 2017-03-03 2018-05-03
4 2 transferring No 2011-05-10 2012-05-04
5 2 preparing food No 2011-05-10 2012-05-04
6 2 medications No 2011-05-10 2012-05-04
7 2 transferring Yes 2011-15-03 2012-05-04
8 2 preparing food Yes 2011-15-03 2012-05-04
9 2 medications No 2011-15-03 2012-05-04
10 2 transferring Yes 2010-15-12 2012-05-04
11 2 preparing food No 2010-15-12 2012-05-04
12 2 medications No 2010-15-12 2012-05-04
13 2 transferring Yes 2009-10-10 2012-05-04
14 2 preparing food No 2009-10-10 2012-05-04
15 2 medications No 2009-10-10 2012-05-04
16 3 medications No 2008-10-10 2010-07-04
I just found one link related to my question here that it did not get any correct answer.
some explanations: for each patientNumber, diagnosisDate is unique.and Answer Date is several time they have filled a questionary.
but what I want to do:?
my goal is to go back from DiagnosisDate every six month, and mark that on to a column as the first 6 month record. in the column we should save which six month is that(the first six month, the second, the third,...).
for example for this dataframe, DiagnosisDate for PatientNumber=1 is 2018-05-03 so it should go back from that time 6 month. the first 6 month is 2017-27-11 as the biggest AnswerDate do not fall under that date, it wont be marked as first six month.
if the first answerdate falls under this date, it will be marked as first 6 month.
so here PatientNumber=1 got 3 in the column 6month, because when we get back from diagnosisdate 6 month back, the answerdate falls under that 6 month there time later.
so the output of this dataframe will be:
PatientNumber QT Answer Answerdate DiagnosisDate 6month
1 1 transferring No 2017-03-03 2018-05-03 3
2 1 preparing food No 2017-03-03 2018-05-03 3
3 1 medications Yes 2017-03-03 2018-05-03 3
4 2 transferring No 2011-05-10 2012-05-04 1
5 2 preparing food No 2011-05-10 2012-05-04 1
6 2 medications No 2011-05-10 2012-05-04 1
7 2 transferring Yes 2011-15-04 2012-05-04 2
8 2 preparing food Yes 2011-15-04 2012-05-04 2
9 2 medications No 2011-15-04 2012-05-04 2
10 2 transferring Yes 2010-15-12 2012-05-04 3
11 2 preparing food No 2010-15-12 2012-05-04 3
12 2 medications No 2010-15-12 2012-05-04 3
13 2 transferring Yes 2009-10-10 2012-05-04 5
14 2 preparing food No 2009-10-10 2012-05-04 5
15 2 medications No 2009-10-10 2012-05-04 5
16 3 medications No 2008-10-10 2010-07-04 4
For PatientNumber =2, it will start from DiagnosisDate =2012-05-04 and go back 6 month.it will be 2011-11-04.
I applied this:
data['6month'] = pd.date_range(end=data['diagnosisdate'],periods=2, freq='6M',closed='left')
firstly it just care about month, so calculate approximately not exactly, and I could not find a way to mention the number of 6 month, like the thing I mention in the above dataframe(in the column 6 month I meantion 1 2, ... instead of the date.
Therefore according to the data, we may see in the column 6month the numbers from 1...10(considering 5 years before diagnosis)
Long story. hope someone can take time :).
Also I need to keep the whole column on the result as it is.
回答1:
It's not exactely what you want, but a work around giving good enough results. I think you can do by calculating the time difference between column DiagnosisDate and Answerdate, and divide by pd.np.timedelta64(6, 'M') (to change the frequency to 6 months). Then you need the ceil function to get the integer above, such as:
data['6month'] = (pd.np.ceil((data['DiagnosisDate']-pd.Timedelta(days=1)-data['Answerdate'])
/pd.np.timedelta64(6, 'M')).astype(int))
for ignoring negative columns:
data = data[(data['6month'] >= 0)]
With your sample, it gives:
PatientNumber QT Answer Answerdate DiagnosisDate 6month
1 1 transferring No 2017-03-03 2018-03-05 3
2 1 preparing No 2017-03-03 2018-03-05 3
3 1 medications Yes 2017-03-03 2018-03-05 3
4 2 transferring No 2011-10-05 2012-04-05 1
5 2 preparing No 2011-10-05 2012-04-05 1
6 2 medications No 2011-10-05 2012-04-05 1
7 2 transferring Yes 2011-03-15 2012-04-05 3
8 2 preparing Yes 2011-03-15 2012-04-05 3
9 2 medications No 2011-03-15 2012-04-05 3
10 2 transferring Yes 2010-12-15 2012-04-05 3
11 2 preparing No 2010-12-15 2012-04-05 3
12 2 medications No 2010-12-15 2012-04-05 3
13 2 transferring Yes 2009-10-10 2012-04-05 5
14 2 preparing No 2009-10-10 2012-04-05 5
15 2 medications No 2009-10-10 2012-04-05 5
16 3 medications No 2008-10-10 2010-04-07 3
Also, I would not use pd.date_range as it seems not to act like you want, but I might be wrong.
EDIT: to remove the case where DiagnosisDate is before Answerdate, once you have created your column 6months, just do data = data[data['6months'] > 0] as the value would be negative or zero in this case
来源:https://stackoverflow.com/questions/50862198/date-range-for-six-monthly-in-pandas