date range for six monthly in pandas

感情迁移 提交于 2019-12-21 20:59:36

问题


So, this is my data frame.

PatientNumber           QT         Answer   Answerdate  DiagnosisDate 
1        1          transferring     No      2017-03-03 2018-05-03     
2        1          preparing food   No      2017-03-03 2018-05-03     
3        1          medications      Yes     2017-03-03 2018-05-03     
4        2          transferring     No      2011-05-10 2012-05-04       
5        2          preparing food   No      2011-05-10 2012-05-04     
6        2          medications      No      2011-05-10 2012-05-04     
7        2          transferring     Yes     2011-15-03  2012-05-04     
8        2          preparing food   Yes     2011-15-03  2012-05-04     
9        2          medications      No      2011-15-03  2012-05-04     
10       2          transferring     Yes     2010-15-12 2012-05-04     
11       2          preparing food   No      2010-15-12 2012-05-04     
12       2          medications      No      2010-15-12 2012-05-04     
13       2          transferring     Yes     2009-10-10 2012-05-04     
14       2          preparing food   No      2009-10-10 2012-05-04     
15       2          medications      No      2009-10-10 2012-05-04     
16       3          medications      No      2008-10-10 2010-07-04     

I just found one link related to my question here that it did not get any correct answer.

some explanations: for each patientNumber, diagnosisDate is unique.and Answer Date is several time they have filled a questionary.

but what I want to do:?

my goal is to go back from DiagnosisDate every six month, and mark that on to a column as the first 6 month record. in the column we should save which six month is that(the first six month, the second, the third,...).

for example for this dataframe, DiagnosisDate for PatientNumber=1 is 2018-05-03 so it should go back from that time 6 month. the first 6 month is 2017-27-11 as the biggest AnswerDate do not fall under that date, it wont be marked as first six month. if the first answerdate falls under this date, it will be marked as first 6 month.

so here PatientNumber=1 got 3 in the column 6month, because when we get back from diagnosisdate 6 month back, the answerdate falls under that 6 month there time later. so the output of this dataframe will be:

PatientNumber           QT         Answer   Answerdate  DiagnosisDate  6month
1        1          transferring     No      2017-03-03 2018-05-03     3
2        1          preparing food   No      2017-03-03 2018-05-03     3
3        1          medications      Yes     2017-03-03 2018-05-03     3
4        2          transferring     No      2011-05-10 2012-05-04     1 
5        2          preparing food   No      2011-05-10 2012-05-04     1
6        2          medications      No      2011-05-10 2012-05-04     1
7        2          transferring     Yes     2011-15-04  2012-05-04    2
8        2          preparing food   Yes     2011-15-04  2012-05-04    2
9        2          medications      No      2011-15-04  2012-05-04    2
10       2          transferring     Yes     2010-15-12 2012-05-04     3
11       2          preparing food   No      2010-15-12 2012-05-04     3
12       2          medications      No      2010-15-12 2012-05-04     3
13       2          transferring     Yes     2009-10-10 2012-05-04     5
14       2          preparing food   No      2009-10-10 2012-05-04     5
15       2          medications      No      2009-10-10 2012-05-04     5
16       3          medications      No      2008-10-10 2010-07-04     4

For PatientNumber =2, it will start from DiagnosisDate =2012-05-04 and go back 6 month.it will be 2011-11-04.

I applied this:

data['6month'] = pd.date_range(end=data['diagnosisdate'],periods=2, freq='6M',closed='left')

firstly it just care about month, so calculate approximately not exactly, and I could not find a way to mention the number of 6 month, like the thing I mention in the above dataframe(in the column 6 month I meantion 1 2, ... instead of the date.

Therefore according to the data, we may see in the column 6month the numbers from 1...10(considering 5 years before diagnosis)

Long story. hope someone can take time :).

Also I need to keep the whole column on the result as it is.


回答1:


It's not exactely what you want, but a work around giving good enough results. I think you can do by calculating the time difference between column DiagnosisDate and Answerdate, and divide by pd.np.timedelta64(6, 'M') (to change the frequency to 6 months). Then you need the ceil function to get the integer above, such as:

data['6month'] = (pd.np.ceil((data['DiagnosisDate']-pd.Timedelta(days=1)-data['Answerdate'])
                                             /pd.np.timedelta64(6, 'M')).astype(int))

for ignoring negative columns:

data = data[(data['6month'] >= 0)]

With your sample, it gives:

    PatientNumber            QT Answer Answerdate DiagnosisDate  6month
1               1  transferring     No 2017-03-03    2018-03-05       3
2               1     preparing     No 2017-03-03    2018-03-05       3
3               1   medications    Yes 2017-03-03    2018-03-05       3
4               2  transferring     No 2011-10-05    2012-04-05       1
5               2     preparing     No 2011-10-05    2012-04-05       1
6               2   medications     No 2011-10-05    2012-04-05       1
7               2  transferring    Yes 2011-03-15    2012-04-05       3
8               2     preparing    Yes 2011-03-15    2012-04-05       3
9               2   medications     No 2011-03-15    2012-04-05       3
10              2  transferring    Yes 2010-12-15    2012-04-05       3
11              2     preparing     No 2010-12-15    2012-04-05       3
12              2   medications     No 2010-12-15    2012-04-05       3
13              2  transferring    Yes 2009-10-10    2012-04-05       5
14              2     preparing     No 2009-10-10    2012-04-05       5
15              2   medications     No 2009-10-10    2012-04-05       5
16              3   medications     No 2008-10-10    2010-04-07       3

Also, I would not use pd.date_range as it seems not to act like you want, but I might be wrong.

EDIT: to remove the case where DiagnosisDate is before Answerdate, once you have created your column 6months, just do data = data[data['6months'] > 0] as the value would be negative or zero in this case



来源:https://stackoverflow.com/questions/50862198/date-range-for-six-monthly-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!