Pandas percentage by value in a column

后端 未结 5 1935
死守一世寂寞
死守一世寂寞 2020-12-03 06:17

I want to get a percentage of a particular value in a df column. Say I have a df with (col1, col2 , col3, gender) gender column has values of M or F. I want to get the perc

相关标签:
5条回答
  • 2020-12-03 07:07

    finding the percentage of target variation to chenck imbalance/not.

    g = data[Target_col_Y]
    df = pd.concat([g.value_counts(),              
    g.value_counts(normalize=True).mul(100)],axis=1,keys=('counts','percentage'))
    
    print (df)
    

    counts percentage

    0 36548 88.734583

    1 4640 11.265417

    finding the maximum in the columns percentage here, to check how much #imbalance there

    df1=df.diff(periods=1,axis=0)
    difvalue=df1[[list(df1.columns)[-1]]].max()
    
    0 讨论(0)
  • 2020-12-03 07:10
    print('(Gender Male= 0):\n {}%'.format(100 - round(df['Gender'].mean()*100, 2)))
    print('(Gender Female=1):\n{}%'.format(round(df['Gender'].mean()*100, 2)))
    
    0 讨论(0)
  • 2020-12-03 07:14

    Let's say there are 200 values out of which 120 are categorized as M and 80 as F

    1)

    df['gender'].value_counts()
    
     output:
    
     M=120
     F=80
    

    2)

    df['gender'].value_counts(Normalize=True)
    
      output:
    
      M=0.60
      F=0.40
    

    3)

    df['gender'].value_counts(Normalize=True)*100 #will convert output to percentages
    
      output:
    
      M=60
      F=40
    
    0 讨论(0)
  • 2020-12-03 07:15

    If you do not need to look M and F values other than gender column then, may be you can try using value_counts() and count() as following:

    df = pd.DataFrame({'gender':['M','M','F', 'F', 'F']})
    # Percentage calculation
    (df['gender'].value_counts()/df['gender'].count())*100
    

    Result:

    F    60.0
    M    40.0
    Name: gender, dtype: float64
    

    Or, using groupby:

    (df.groupby('gender').size()/df['gender'].count())*100
    
    0 讨论(0)
  • 2020-12-03 07:17

    Use value_counts with normalize=True:

    df['gender'].value_counts(normalize=True) * 100
    
    0 讨论(0)
提交回复
热议问题