Aggregate all dataframe row pair combinations using pandas

前端 未结 2 814
你的背包
你的背包 2020-12-19 04:16

I use python pandas to perform grouping and aggregation across data frames, but I would like to now perform specific pairwise aggregation of rows (n choose 2, statistical co

2条回答
  •  梦毁少年i
    2020-12-19 04:56

    I can't think of a clever vectorized way to do this, but unless performance is a real bottleneck I tend to use the simplest thing which makes sense. In this case, I might set_index("Gene") and then use loc to pick out the rows:

    >>> df = df.set_index("Gene")
    >>> cc = list(combinations(mygenes,2))
    >>> out = pd.DataFrame([df.loc[c,:].sum() for c in cc], index=cc)
    >>> out
                  case1  case2  control1  control2
    (ABC1, ABC2)      1      2         0         1
    (ABC1, ABC3)      1      2         1         1
    (ABC1, ABC4)      0      1         1         2
    (ABC2, ABC3)      2      2         1         0
    (ABC2, ABC4)      1      1         1         1
    (ABC3, ABC4)      1      1         2         1
    

提交回复
热议问题