Extract most important keywords from a set of documents

前端未结

关注

 4  1703

故里飘歌 2020-12-21 16:00

I have a set of 3000 text documents and I want to extract top 300 keywords (could be single word or multiple words).

I have tried the below approaches -

RAK

4条回答

盖世英雄少女心 (楼主)

2020-12-21 16:45

import os
import operator
from collections import defaultdict
files = os.listdir()
topWords = ["word1", "word2.... etc"]
wordsCount = 0
words = defaultdict(lambda: 0)
for file in files:
    open_file = open(file, "r")
    for line in open_file.readlines():
        raw_words = line.split()
        for word in raw_words:
            words[word] += 1
sorted_words = sorted(words.items(), key=operator.itemgetter(1))

now take top 300 from sorted words, they are the words you want.

0 讨论(0)

查看其它4个回答