Co-occurrence matrix from nested list of words

前端 未结 8 820
无人及你
无人及你 2020-11-30 10:27

I have a list of names like:

names = [\'A\', \'B\', \'C\', \'D\']

and a list of documents, that in each documents some of these names are m

8条回答
  •  醉话见心
    2020-11-30 10:32

    '''for a window of 2, data_corpus is the series consisting of text data, words is the list consisting of words for which co-occurence matrix is build'''

    "co_oc is the co-occurence matrix"

    co_oc=pd.DataFrame(index=words,columns=words)
    
    for j in tqdm(data_corpus):
    
        k=j.split()
    
        for l in range(len(k)):
    
            if l>=5 and l<(len(k)-6):
                if k[l] in words:
                    for m in range(l-5,l+6):
                        if m==l:
                            continue
                        elif k[m] in words:
                            co_oc[k[l]][k[m]]+=1
    
            elif l>=(len(k)-6):
                if k[l] in words:
                    for m in range(l-5,len(k)):
                        if m==l:
                            continue
                        elif k[m] in words:
                            co_oc[k[l]][k[m]]+=1
    
            else:
                if k[l] in words:
                    for m in range(0,l+5):
                        if m==l:
                            continue
                        elif k[m] in words:
                            co_oc[k[l]][k[m]]+=1
    print(co_oc.head())
    

提交回复
热议问题