问题
So, this is my data frame.
PatientNumber QT Answer Answerdate DiagnosisDate
1 1 transferring No 2017-03-03 2018-05-03
2 1 preparing food No 2017-03-03 2018-05-03
3 1 medications Yes 2017-03-03 2018-05-03
4 2 transferring No 2011-05-10 2012-05-04
5 2 preparing food No 2011-05-10 2012-05-04
6 2 medications No 2011-05-10 2012-05-04
7 2 transferring Yes 2011-15-03 2012-05-04
8 2 preparing food Yes 2011-15-03 2012-05-04
9 2 medications No 2011-15-03 2012-05-04
10 2 transferring Yes 2010-15-12 2012-05-04
11 2 preparing food No 2010-15-12 2012-05-04
12 2 medications No 2010-15-12 2012-05-04
13 2 transferring Yes 2009-10-10 2012-05-04
14 2 preparing food No 2009-10-10 2012-05-04
15 2 medications No 2009-10-10 2012-05-04
16 3 medications No 2008-10-10 2010-07-04
I just found one link related to my question here that it did not get any correct answer.
some explanations: for each patientNumber, diagnosisDate is unique.and Answer Date is several time they have filled a questionary.
but what I want to do:?
my goal is to go back from DiagnosisDate
every six month
, and mark that on to a column as the first 6 month record
. in the column we should save which six month is that(the first six month, the second, the third,...).
for example for this dataframe, DiagnosisDate
for PatientNumber=1
is 2018-05-03
so it should go back from that time 6 month
. the first 6 month
is 2017-27-11
as the biggest AnswerDate do not fall under that date, it wont be marked as first six month
.
if the first answerdate
falls under this date, it will be marked as first 6 month
.
so here PatientNumber=1
got 3
in the column 6month
, because when we get back from diagnosisdate
6 month
back, the answerdate
falls under that 6 month
there time later.
so the output of this dataframe will be:
PatientNumber QT Answer Answerdate DiagnosisDate 6month
1 1 transferring No 2017-03-03 2018-05-03 3
2 1 preparing food No 2017-03-03 2018-05-03 3
3 1 medications Yes 2017-03-03 2018-05-03 3
4 2 transferring No 2011-05-10 2012-05-04 1
5 2 preparing food No 2011-05-10 2012-05-04 1
6 2 medications No 2011-05-10 2012-05-04 1
7 2 transferring Yes 2011-15-04 2012-05-04 2
8 2 preparing food Yes 2011-15-04 2012-05-04 2
9 2 medications No 2011-15-04 2012-05-04 2
10 2 transferring Yes 2010-15-12 2012-05-04 3
11 2 preparing food No 2010-15-12 2012-05-04 3
12 2 medications No 2010-15-12 2012-05-04 3
13 2 transferring Yes 2009-10-10 2012-05-04 5
14 2 preparing food No 2009-10-10 2012-05-04 5
15 2 medications No 2009-10-10 2012-05-04 5
16 3 medications No 2008-10-10 2010-07-04 4
For PatientNumber =2, it will start from DiagnosisDate =2012-05-04
and go back 6 month.it will be 2011-11-04
.
I applied this:
data['6month'] = pd.date_range(end=data['diagnosisdate'],periods=2, freq='6M',closed='left')
firstly it just care about month, so calculate approximately not exactly, and I could not find a way to mention the number of 6 month, like the thing I mention in the above dataframe(in the column 6 month I meantion 1 2, ... instead of the date.
Therefore according to the data, we may see in the column 6month
the numbers from 1...10
(considering 5 years before diagnosis)
Long story. hope someone can take time :).
Also I need to keep the whole column on the result as it is.
回答1:
It's not exactely what you want, but a work around giving good enough results. I think you can do by calculating the time difference between column DiagnosisDate and Answerdate, and divide by pd.np.timedelta64(6, 'M')
(to change the frequency to 6 months). Then you need the ceil
function to get the integer above, such as:
data['6month'] = (pd.np.ceil((data['DiagnosisDate']-pd.Timedelta(days=1)-data['Answerdate'])
/pd.np.timedelta64(6, 'M')).astype(int))
for ignoring negative columns:
data = data[(data['6month'] >= 0)]
With your sample, it gives:
PatientNumber QT Answer Answerdate DiagnosisDate 6month
1 1 transferring No 2017-03-03 2018-03-05 3
2 1 preparing No 2017-03-03 2018-03-05 3
3 1 medications Yes 2017-03-03 2018-03-05 3
4 2 transferring No 2011-10-05 2012-04-05 1
5 2 preparing No 2011-10-05 2012-04-05 1
6 2 medications No 2011-10-05 2012-04-05 1
7 2 transferring Yes 2011-03-15 2012-04-05 3
8 2 preparing Yes 2011-03-15 2012-04-05 3
9 2 medications No 2011-03-15 2012-04-05 3
10 2 transferring Yes 2010-12-15 2012-04-05 3
11 2 preparing No 2010-12-15 2012-04-05 3
12 2 medications No 2010-12-15 2012-04-05 3
13 2 transferring Yes 2009-10-10 2012-04-05 5
14 2 preparing No 2009-10-10 2012-04-05 5
15 2 medications No 2009-10-10 2012-04-05 5
16 3 medications No 2008-10-10 2010-04-07 3
Also, I would not use pd.date_range
as it seems not to act like you want, but I might be wrong.
EDIT: to remove the case where DiagnosisDate is before Answerdate, once you have created your column 6months, just do data = data[data['6months'] > 0]
as the value would be negative or zero in this case
来源:https://stackoverflow.com/questions/50862198/date-range-for-six-monthly-in-pandas