Extract most important keywords from a set of documents

前端 未结 4 1703
故里飘歌
故里飘歌 2020-12-21 16:00

I have a set of 3000 text documents and I want to extract top 300 keywords (could be single word or multiple words).

I have tried the below approaches -

RAK

4条回答
  •  盖世英雄少女心
    2020-12-21 16:45

    import os
    import operator
    from collections import defaultdict
    files = os.listdir()
    topWords = ["word1", "word2.... etc"]
    wordsCount = 0
    words = defaultdict(lambda: 0)
    for file in files:
        open_file = open(file, "r")
        for line in open_file.readlines():
            raw_words = line.split()
            for word in raw_words:
                words[word] += 1
    sorted_words = sorted(words.items(), key=operator.itemgetter(1))
    

    now take top 300 from sorted words, they are the words you want.

提交回复
热议问题