Co-occurrence matrix from nested list of words

前端未结

关注

 8  795

无人及你 2020-11-30 10:27

I have a list of names like:

names = [\'A\', \'B\', \'C\', \'D\']

and a list of documents, that in each documents some of these names are m

8条回答

时光取名叫无心 (楼主)

2020-11-30 10:35

You can also use matrix tricks in order to find the co-occurrence matrix too. Hope this works well when you have bigger vocabulary.

import scipy.sparse as sp
voc2id = dict(zip(names, range(len(names))))
rows, cols, vals = [], [], []
for r, d in enumerate(document):
    for e in d:
        if voc2id.get(e) is not None:
            rows.append(r)
            cols.append(voc2id[e])
            vals.append(1)
X = sp.csr_matrix((vals, (rows, cols)))

Now, you can find coocurrence matrix by simple multiply X.T with X

Xc = (X.T * X) # coocurrence matrix
Xc.setdiag(0)
print(Xc.toarray())

0 讨论(0)

查看其它8个回答