Extract most important keywords from a set of documents

前端 未结 4 1696
故里飘歌
故里飘歌 2020-12-21 16:00

I have a set of 3000 text documents and I want to extract top 300 keywords (could be single word or multiple words).

I have tried the below approaches -

RAK

4条回答
  •  天命终不由人
    2020-12-21 16:47

    Is better for you to choose manually those 300 words (it's not so much and is one time) - Code Written in Python 3

    import os
    files = os.listdir()
    topWords = ["word1", "word2.... etc"]
    wordsCount = 0
    for file in files: 
            file_opened = open(file, "r")
            lines = file_opened.read().split("\n")
            for word in topWords: 
                    if word in lines and wordsCount < 301:
                                    print("I found %s" %word)
                                    wordsCount += 1
            #Check Again wordsCount to close first repetitive instruction
            if wordsCount == 300:
                    break
    

提交回复
热议问题