问题
I have this dataframe:
df1:
drug_id illness
lexapro.1 HD
lexapro.1 MS
lexapro.2 HDED
lexapro.2 MS
lexapro.2 MS
lexapro.3 CD
lexapro.3 Sweat
lexapro.4 HD
lexapro.5 WD
lexapro.5 FN
I am going to first group the data based on drug_id, and search for availability of HD, MS, and FN in the illness column. Then fill in the second data frame like this:
df2:
drug_id HD MS FN
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
This is my code for grouping.
df1.groupby('drug_id', sort=False).isin('HD')
but I do not know how I can assign 1 to the F2['HD']
for each drug_id, if the 'HD'
was available for that drug_id
in df1
.
Thank you.
回答1:
Option 1crosstab
pd.crosstab(df.drug_id, df.illness)[['HD', 'MS', 'FN']].ge(1).astype(int)
illness HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
Option 2groupby
+ value_counts
+ unstack
df.groupby('drug_id').illness.value_counts()\
.unstack()[['HD', 'MS', 'FN']].ge(1).astype(int)
illness HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
Option 3get_dummies
+ sum
df.set_index('drug_id').illness.str.get_dummies()\
.sum(level=0)[['HD', 'MS', 'FN']].ge(1).astype(int)
HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
Thanks to Scott Boston for the improvement!
回答2:
df.groupby(['drug_id','illness']).illness.count().unstack(-1).reindex_axis(['HD', 'MS', 'FN'],axis=1).ge(0).astype(int)
Out[276]:
illness HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
来源:https://stackoverflow.com/questions/46550638/group-by-one-column-and-show-the-availability-of-specific-values-from-another-co