Constructing a co-occurrence matrix in python pandas

后端未结

关注

 4  1669

南笙 2020-11-28 04:29

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring

4条回答

时光说笑 (楼主)

2020-11-28 04:49
In case that you have larger corpus and term-frequency matrix, using sparse matrix multiplication might be more efficient. I use the same trick of matrix multiplication refered to algo answer on this page.
```
import scipy.sparse as sp
X = sp.csr_matrix(df.astype(int).values) # convert dataframe to sparse matrix
Xc = X.T * X # multiply sparse matrix # 
Xc.setdiag(0) # reset diagonal
print(Xc.todense()) # to print co-occurence matrix in dense format
```
Xc here will be the co-occurence matrix in sparse csr format
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...