I have a set of 3000 text documents and I want to extract top 300 keywords (could be single word or multiple words).
I have tried the below approaches -
RAK
Is better for you to choose manually those 300 words (it's not so much and is one time) - Code Written in Python 3
import os
files = os.listdir()
topWords = ["word1", "word2.... etc"]
wordsCount = 0
for file in files:
file_opened = open(file, "r")
lines = file_opened.read().split("\n")
for word in topWords:
if word in lines and wordsCount < 301:
print("I found %s" %word)
wordsCount += 1
#Check Again wordsCount to close first repetitive instruction
if wordsCount == 300:
break