Java Stanford NLP: Part of Speech labels?

前端未结

关注

 10  2106

借酒劲吻你 2020-11-27 08:59

The Stanford NLP, demo\'d here, gives an output like this:

Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.

What do the Part of S

10条回答

天涯浪人 (楼主)

2020-11-27 09:41

In spacy it was very fast i think, in just a low-end notebook it will run like this :

import spacy
import time

start = time.time()

with open('d:/dictionary/e-store.txt') as f:
    input = f.read()

word = 0
result = []

nlp = spacy.load("en_core_web_sm")
doc = nlp(input)

for token in doc:
    if token.pos_ == "NOUN":
        result.append(token.text)
    word += 1

elapsed = time.time() - start

print("From", word, "words, there is", len(result), "NOUN found in", elapsed, "seconds")

The Output in several trial :

From 3547 words, there is 913 NOUN found in 7.768507719039917 seconds
From 3547 words, there is 913 NOUN found in 7.408619403839111 seconds
From 3547 words, there is 913 NOUN found in 7.431427955627441 seconds

So, I think you don't need to worry about the looping for each POS tag check :)

More improvement I got when disabled certain pipeline :

nlp = spacy.load("en_core_web_sm", disable = 'ner')

So, The result is faster :

From 3547 words, there is 913 NOUN found in 6.212834596633911 seconds
From 3547 words, there is 913 NOUN found in 6.257707595825195 seconds
From 3547 words, there is 913 NOUN found in 6.371225833892822 seconds

0 讨论(0)

查看其它10个回答