The Stanford NLP, demo\'d here, gives an output like this:
Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.
What do the Part of S
In spacy it was very fast i think, in just a low-end notebook it will run like this :
import spacy
import time
start = time.time()
with open('d:/dictionary/e-store.txt') as f:
input = f.read()
word = 0
result = []
nlp = spacy.load("en_core_web_sm")
doc = nlp(input)
for token in doc:
if token.pos_ == "NOUN":
result.append(token.text)
word += 1
elapsed = time.time() - start
print("From", word, "words, there is", len(result), "NOUN found in", elapsed, "seconds")
The Output in several trial :
From 3547 words, there is 913 NOUN found in 7.768507719039917 seconds
From 3547 words, there is 913 NOUN found in 7.408619403839111 seconds
From 3547 words, there is 913 NOUN found in 7.431427955627441 seconds
So, I think you don't need to worry about the looping for each POS tag check :)
More improvement I got when disabled certain pipeline :
nlp = spacy.load("en_core_web_sm", disable = 'ner')
So, The result is faster :
From 3547 words, there is 913 NOUN found in 6.212834596633911 seconds
From 3547 words, there is 913 NOUN found in 6.257707595825195 seconds
From 3547 words, there is 913 NOUN found in 6.371225833892822 seconds