问题
import numpy as np
from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize
#english.all.3class.distsim.crf.ser.gz
st = StanfordNERTagger('/media/sf_codebase/modules/stanford-ner-2018-10-16/classifiers/english.all.3class.distsim.crf.ser.gz',
'/media/sf_codebase/modules/stanford-ner-2018-10-16/stanford-ner.jar',
encoding='utf-8')
After initializing above code Stanford NLP following code takes 10 second to tag the text as shown below. How to speed up?
%%time
text="My name is John Doe"
tokenized_text = word_tokenize(text)
classified_text = st.tag(tokenized_text)
print (classified_text)
Output
[('My', 'O'), ('name', 'O'), ('is', 'O'), ('John', 'PERSON'), ('Doe', 'PERSON')]
CPU times: user 4 ms, sys: 20 ms, total: 24 ms
Wall time: 10.9 s
回答1:
Another solution within NLTK is to not use the old nltk.tag.StanfordNERTagger
but instead to use the newer nltk.parse.CoreNLPParser
. See, e.g., https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK .
More generally the secret to good performance is indeed to use a server on the Java side, which you can repeatedly call without having to start new subprocesses for each sentence processed. You can either use the NERServer
if you just need NER or the StanfordCoreNLPServer
for all CoreNLP functionality. There are a number of Python interfaces to it, see: https://stanfordnlp.github.io/CoreNLP/other-languages.html#python
回答2:
Found the answer.
Initiate the Stanford NLP Server in background in the folder where Stanford NLP is unzipped.
java -Djava.ext.dirs=./lib -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -port 9199 -loadClassifier ./classifiers/english.all.3class.distsim.crf.ser.gz
Then initiate Stanford NLP Server tagger in Python using sner library.
from sner import Ner
tagger = Ner(host='localhost',port=9199)
Then run the tagger.
%%time
classified_text=tagger.get_entities(text)
print (classified_text)
Output:
[('My', 'O'), ('name', 'O'), ('is', 'O'), ('John', 'PERSON'), ('Doe', 'PERSON')]
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 18.2 ms
Almost 300 times better performance in terms of timing! Wow!
来源:https://stackoverflow.com/questions/57424885/how-to-speedup-stanford-nlp-in-python