Group by one column and show the availability of specific values from another column

有些话、适合烂在心里 提交于 2019-12-12 04:59:35

问题


I have this dataframe:

df1:

  drug_id      illness 
    lexapro.1     HD
    lexapro.1     MS
    lexapro.2     HDED
    lexapro.2     MS
    lexapro.2     MS
    lexapro.3     CD
    lexapro.3     Sweat
    lexapro.4     HD
    lexapro.5     WD
    lexapro.5     FN

I am going to first group the data based on drug_id, and search for availability of HD, MS, and FN in the illness column. Then fill in the second data frame like this:

df2:
drug_id       HD      MS    FN
lexapro.1      1      1      0
lexapro.2      0      1      0   
lexapro.3      0      0      0
lexapro.4      1      0      0
lexapro.5      0      0      1

This is my code for grouping.

df1.groupby('drug_id', sort=False).isin('HD')

but I do not know how I can assign 1 to the F2['HD'] for each drug_id, if the 'HD' was available for that drug_id in df1.

Thank you.


回答1:


Option 1
crosstab

pd.crosstab(df.drug_id, df.illness)[['HD', 'MS', 'FN']].ge(1).astype(int)

illness    HD  MS  FN
drug_id              
lexapro.1   1   1   0
lexapro.2   0   1   0
lexapro.3   0   0   0
lexapro.4   1   0   0
lexapro.5   0   0   1

Option 2
groupby + value_counts + unstack

df.groupby('drug_id').illness.value_counts()\
     .unstack()[['HD', 'MS', 'FN']].ge(1).astype(int)

illness    HD  MS  FN
drug_id              
lexapro.1   1   1   0
lexapro.2   0   1   0
lexapro.3   0   0   0
lexapro.4   1   0   0
lexapro.5   0   0   1

Option 3
get_dummies + sum

df.set_index('drug_id').illness.str.get_dummies()\
          .sum(level=0)[['HD', 'MS', 'FN']].ge(1).astype(int)

           HD  MS  FN
drug_id              
lexapro.1   1   1   0
lexapro.2   0   1   0
lexapro.3   0   0   0
lexapro.4   1   0   0
lexapro.5   0   0   1

Thanks to Scott Boston for the improvement!




回答2:


df.groupby(['drug_id','illness']).illness.count().unstack(-1).reindex_axis(['HD', 'MS', 'FN'],axis=1).ge(0).astype(int)
Out[276]: 
illness    HD  MS  FN
drug_id              
lexapro.1   1   1   0
lexapro.2   0   1   0
lexapro.3   0   0   0
lexapro.4   1   0   0
lexapro.5   0   0   1


来源:https://stackoverflow.com/questions/46550638/group-by-one-column-and-show-the-availability-of-specific-values-from-another-co

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!