Is there a pythonic way to do a contingency table in Pandas?

后端 未结 4 615
傲寒
傲寒 2020-12-30 00:09

Given a dataframe that looks like this:

            A   B      
2005-09-06  5  -2  
2005-09-07 -1   3  
2005-09-08  4   5 
2005-09-09 -8   2
2005-09-10 -2  -         


        
相关标签:
4条回答
  • 2020-12-30 00:40

    Here's a really useful page about the pandas crosstab function:

    http://chrisalbon.com/python/pandas_crosstabs.html

    So I think for what you'd like to do you should use

    import pandas as pd
    pd.crosstab(data['A']>0, data['B']>0)
    

    Hope that helps!

    0 讨论(0)
  • 2020-12-30 00:45

    Let us call your dataframe data. Try

    a = data['A']>0
    b = data['B']>0
    data.groupby([a,b]).count() 
    
    0 讨论(0)
  • 2020-12-30 00:46

    Probably easiest to just use the pandas function crosstab. Borrowing from Dyno Fu above:

    import pandas as pd
    from StringIO import StringIO
    table = """dt          A   B
    2005-09-06  5  -2
    2005-09-07 -1   3
    2005-09-08  4   5
    2005-09-09 -8   2
    2005-09-10 -2  -5
    2005-09-11 -7   9
    2005-09-12  2   8
    2005-09-13  6  -5
    2005-09-14  6  -5
    """
    sio = StringIO(table)
    df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
    df.set_index("dt", inplace=True)
    
    pd.crosstab(df.A > 0, df.B > 0)
    

    Output:

    B      False  True 
    A                  
    False      1      3
    True       3      2
    
    [2 rows x 2 columns]
    

    Also the table is usable if you want to do a Fisher exact test with scipy.stats etc:

    from scipy.stats import fisher_exact
    tab = pd.crosstab(df.A > 0, df.B > 0)
    fisher_exact(tab)
    
    0 讨论(0)
  • 2020-12-30 00:47
    import pandas as pd
    from StringIO import StringIO
    
    table = """dt          A   B
    2005-09-06  5  -2
    2005-09-07 -1   3
    2005-09-08  4   5
    2005-09-09 -8   2
    2005-09-10 -2  -5
    2005-09-11 -7   9
    2005-09-12  2   8
    2005-09-13  6  -5
    2005-09-14  6  -5
    """
    sio = StringIO(table)
    df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
    df.set_index("dt", inplace=True)
    
    a = df['A'] > 0
    b = df['B'] > 0
    df1 = df.groupby([a,b]).count()
    print df1["A"].unstack()
    

    output:

    B      False  True
    A
    False      1      3
    True       3      2
    

    this is just lnanenok's answer and using unstack() to make it more readable. credit should go to lanenok.

    0 讨论(0)
提交回复
热议问题