Co-occurrence matrix from nested list of words

前端 未结 8 795
无人及你
无人及你 2020-11-30 10:27

I have a list of names like:

names = [\'A\', \'B\', \'C\', \'D\']

and a list of documents, that in each documents some of these names are m

8条回答
  •  时光取名叫无心
    2020-11-30 10:35

    You can also use matrix tricks in order to find the co-occurrence matrix too. Hope this works well when you have bigger vocabulary.

    import scipy.sparse as sp
    voc2id = dict(zip(names, range(len(names))))
    rows, cols, vals = [], [], []
    for r, d in enumerate(document):
        for e in d:
            if voc2id.get(e) is not None:
                rows.append(r)
                cols.append(voc2id[e])
                vals.append(1)
    X = sp.csr_matrix((vals, (rows, cols)))
    

    Now, you can find coocurrence matrix by simple multiply X.T with X

    Xc = (X.T * X) # coocurrence matrix
    Xc.setdiag(0)
    print(Xc.toarray())
    

提交回复
热议问题