Unable to use Stanford NER in python module

问题

I want to use Python Stanford NER module but keep getting an error,I searched it on internet but got nothing. Here is the basic usage with error.

import ner
tagger = ner.HttpNER(host='localhost', port=8080)
tagger.get_entities("University of California is located in California,   

United States")

Error

Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
tagger.get_entities("University of California is located in California, United States")
File "C:\Python27\lib\site-packages\ner\client.py", line 81, in get_entities
tagged_text = self.tag_text(text)
File "C:\Python27\lib\site-packages\ner\client.py", line 165, in tag_text
c.request('POST', self.location, params, headers)
File "C:\Python27\lib\httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1097, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1053, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 897, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 859, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 836, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061] No connection could be made because the target machine actively refused it

Using windows 10 with latest Java installed

回答1:

The Python Stanford NER module is a wrapper for the Stanford NER that allows you to run python commands to use the NER service.
The NER service is a separate entity to the Python module. It is a Java program. To access this service, via python, or any other way, you first need to start the service.
Details on how to start the Java Program/service can be found here - http://nlp.stanford.edu/software/CRF-NER.shtml
The NER comes with a .bat file for windows and a .sh file for unix/linux. I think these files start the GUI
To start the service without the GUI you should run a command similar to this:
java -mx600m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz
This runs the NER jar, sets the memory, and sets the classifier you want to use. (I think youll have to be in the Stanford NER directory to run this)
Once the NER program is running then you will be able to run your python code and query the NER.

回答2:

This is the complete Stanford NER script in python 3x

This code will read each text file from "TextFilestoTest" folder and detect entities and store in a data frame (Testing)

import os
import nltk
import pandas as pd
import collections

from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize


stanford_classifier = 'ner-trained-EvensTrain.ser.gz'
stanford_ner_path = 'stanford-ner.jar'

# Creating Tagger Object
st = StanfordNERTagger(stanford_classifier, stanford_ner_path, encoding='utf-8')

java_path = "C:/Program Files (x86)/Java/jre1.8.0_191/bin/java.exe"
os.environ['JAVAHOME'] = java_path


def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "0":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk

TestFiles = './TextFilestoTest/'
files_path = os.listdir(TestFiles)    
Test = {}

for i in files_path:
    p = (TestFiles+i)
    g= (os.path.splitext(i)[0])
    Test[str(g)] = open(p, 'r').read()

## Predict labels of all words of 200 text files and inserted into dataframe
df_fin = pd.DataFrame(columns = ["filename","Word","Label"])
for i in Test:
    test_text = Test[i]
    test_text = test_text.replace("\n"," ")
    tokenized_text = test_text.split(" ")
    classified_text = st.tag(tokenized_text)
    ne_tagged_sent = classified_text
    named_entities = get_continuous_chunks(ne_tagged_sent)

    flat_list = [item for sublist in named_entities for item in sublist]

    for fl in flat_list:
        df_ = pd.DataFrame()
        df_["filename"]  = [i]
        df_["Word"]  = [fl[0]]
        df_["Label"]  = [fl[1]]
        df_fin = df_fin.append(df_)

df_fin_vone = pd.DataFrame(columns = ["filename","Word","Label"])
test_files_len = list(set(df_fin['filename']))

If any questions comment below, I will answer. Thank you

来源：https://stackoverflow.com/questions/36668340/unable-to-use-stanford-ner-in-python-module

标签

python

python-2.7

nlp

stanford-nlp

named-entity-recognition