I have a list of names like:
names = [\'A\', \'B\', \'C\', \'D\']
and a list of documents, that in each documents some of these names are m
'''for a window of 2, data_corpus is the series consisting of text data, words is the list consisting of words for which co-occurence matrix is build'''
"co_oc is the co-occurence matrix"
co_oc=pd.DataFrame(index=words,columns=words)
for j in tqdm(data_corpus):
k=j.split()
for l in range(len(k)):
if l>=5 and l<(len(k)-6):
if k[l] in words:
for m in range(l-5,l+6):
if m==l:
continue
elif k[m] in words:
co_oc[k[l]][k[m]]+=1
elif l>=(len(k)-6):
if k[l] in words:
for m in range(l-5,len(k)):
if m==l:
continue
elif k[m] in words:
co_oc[k[l]][k[m]]+=1
else:
if k[l] in words:
for m in range(0,l+5):
if m==l:
continue
elif k[m] in words:
co_oc[k[l]][k[m]]+=1
print(co_oc.head())