Constructing a co-occurrence matrix in python pandas

后端 未结 4 1669
南笙
南笙 2020-11-28 04:29

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring

4条回答
  •  时光说笑
    2020-11-28 04:49

    In case that you have larger corpus and term-frequency matrix, using sparse matrix multiplication might be more efficient. I use the same trick of matrix multiplication refered to algo answer on this page.

    import scipy.sparse as sp
    X = sp.csr_matrix(df.astype(int).values) # convert dataframe to sparse matrix
    Xc = X.T * X # multiply sparse matrix # 
    Xc.setdiag(0) # reset diagonal
    print(Xc.todense()) # to print co-occurence matrix in dense format
    

    Xc here will be the co-occurence matrix in sparse csr format

提交回复
热议问题