COUNTIF in pandas python over multiple columns with multiple conditions

前端 未结 3 516
旧时难觅i
旧时难觅i 2020-12-06 06:58

I have a dataset wherein I am trying to determine the number of risk factors per person. So I have the following data:

Person_ID  Age  Smoker  Diabetes
              


        
3条回答
  •  时光说笑
    2020-12-06 07:38

    I would do this the following way.

    1. For each column, create a new boolean series using the column's condition
    2. Add those series row-wise

    (Note that this is simpler if your Smoker and Diabetes column is already boolean (True/False) instead of in strings.)

    It might look like this:

    df = pd.DataFrame({'Age': [30,45,27,18,55],
                       'Smoker':['Y','N','N','Y','Y'],
                       'Diabetes': ['N','N','Y','Y','Y']})
    
       Age Diabetes Smoker
    0   30        N      Y
    1   45        N      N
    2   27        Y      N
    3   18        Y      Y
    4   55        Y      Y
    
    #Step 1
    risk1 = df.Age > 45
    risk2 = df.Smoker == "Y"
    risk3 = df.Diabetes == "Y"
    risk_df = pd.concat([risk1,risk2,risk3],axis=1)
    
         Age Smoker Diabetes
    0  False   True    False
    1  False  False    False
    2  False  False     True
    3  False   True     True
    4   True   True     True
    
    df['Risk_Factors'] = risk_df.sum(axis=1)
    
       Age Diabetes Smoker  Risk_Factors
    0   30        N      Y             1
    1   45        N      N             0
    2   27        Y      N             1
    3   18        Y      Y             2
    4   55        Y      Y             3
    

提交回复
热议问题