Pandas - Conditional Probability of a given specific b

后端 未结 5 1122
孤独总比滥情好
孤独总比滥情好 2021-01-03 02:32

I have DataFrame with two columns of \"a\" and \"b\". How can I find the conditional probability of \"a\" given specific \"b\"?

df.groupby(\'a\').groupby(\         


        
5条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-03 03:17

    To find the total number of class b for each instance of class a you would do

    df.groupby('a').b.value_counts()
    

    For example, create a DataFrame as below:

    df = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B':['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C':np.random.randn(8), 'D':np.random.randn(8)})
    
         A      B         C         D
    0  foo    one -1.565185 -0.465763
    1  bar    one  2.499516 -0.941229
    2  foo    two -0.091160  0.689009
    3  bar  three  1.358780 -0.062026
    4  foo    two -0.800881 -0.341930
    5  bar    two -0.236498  0.198686
    6  foo    one -0.590498  0.281307
    7  foo  three -1.423079  0.424715
    

    Then:

    df.groupby('A')['B'].value_counts()
    
    A
    bar  one      1
         two      1
         three    1
    foo  one      2
         two      2
         three    1
    

    To convert this to a conditional probability, you need to divide by the total size of each group.

    You can either do it with another groupby:

    df.groupby('A')['B'].value_counts() / df.groupby('A')['B'].count()
    
    A
    bar  one      0.333333
         two      0.333333
         three    0.333333
    foo  one      0.400000
         two      0.400000
         three    0.200000
    dtype: float64
    

    Or you can apply a lambda function onto the groups:

    df.groupby('a').b.apply(lambda g: g.value_counts()/len(g))
    

提交回复
热议问题