Exclude a specific date based on a condition using pandas

纵饮孤独 提交于 2020-07-13 15:10:13

问题


df2 = pd.DataFrame({'person_id':[11,11,11,11,11,12,12,13,13,14,14,14,14],
                    'admit_date':['01/01/2011','01/01/2009','12/31/2013','12/31/2017','04/03/2014','08/04/2016',
                                  '03/05/2014','02/07/2011','08/08/2016','12/31/2017','05/01/2011','05/21/2014','07/12/2016']})
df2 = df2.melt('person_id', value_name='dates')
df2['dates'] = pd.to_datetime(df2['dates'])

What I would like to do is

a) Exclude/filter out records from the data frame if a subject has Dec 31st and Jan 1st in its records. Please note that year doesn't matter.

If a subject has either Dec 31st or Jan 1st, we leave them as is.

But if they have both Dec 31st and Jan 1st, we remove one (either Dec 31st or Jan 1st) of them. note they could have multiple entries with the same date as well. Like person_id = 11

I could only do the below

df2_new =  df2['dates'] != '2017-12-31'  #but this excludes if a subject has only `Dec 31st on 2017`. How can I ignore the dates and not consider `year`
df2[df2_new]  

My expected output is like as shown below

For person_id = 11, we drop 12-31 because it had both 12-31 and 01-01 in their records whereas for person_id = 14, we don't drop 12-31 because it has only 12-31 in its records.

We drop 12-31 only when both 12-31 and 01-01 appear in a person's records.


回答1:


Use:

s = df2['dates'].dt.strftime('%m-%d')
m1 = s.eq('01-01').groupby(df2['person_id']).transform('any')
m2 = s.eq('12-31').groupby(df2['person_id']).transform('any')
m3 = np.select([m1 & m2, m1 | m2], [s.ne('12-31'), True], default=True)
df3 = df2[m3]

Result:

# print(df3)
    person_id    variable      dates
0          11  admit_date 2011-01-01
1          11  admit_date 2009-01-01
4          11  admit_date 2014-04-03
5          12  admit_date 2016-08-04
6          12  admit_date 2014-03-05
7          13  admit_date 2011-02-07
8          13  admit_date 2016-08-08
9          14  admit_date 2017-12-31
10         14  admit_date 2011-05-01
11         14  admit_date 2014-05-21
12         14  admit_date 2016-07-12



回答2:


Another way

Coerce the date to day month. Create temp column where 31st Dec is converted to 1st Jan Drop duplicates by Person id and the temp column keeping first.

 df2['dates']=df2['dates'].dt.strftime('%d %b')
df2=df2.assign(check=np.where(df2.dates=='31 Dec','01 Jan', df2.dates)).drop_duplicates(['person_id', 'variable', 'check'], keep='first').drop(columns=['check'])



 person_id    variable   dates   check
0          11  admit_date  01 Jan  01 Jan
4          11  admit_date  03 Apr  03 Apr
5          12  admit_date  04 Aug  04 Aug
6          12  admit_date  05 Mar  05 Mar
7          13  admit_date  07 Feb  07 Feb
8          13  admit_date  08 Aug  08 Aug
9          14  admit_date  31 Dec  01 Jan
10         14  admit_date  01 May  01 May
11         14  admit_date  21 May  21 May
12         14  admit_date  12 Jul  12 Jul


来源:https://stackoverflow.com/questions/62635778/exclude-a-specific-date-based-on-a-condition-using-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!