nltk StanfordNERTagger : NoClassDefFoundError: org/slf4j/LoggerFactory (In Windows)

偶尔善良 提交于 2019-11-28 23:41:08
alvas

EDITED

Note: The following answer will only work on:

  • NLTK version 3.1
  • Stanford Tools compiled since 2015-04-20

As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!


Step 1

First update your NLTK to the version 3.1 using

pip install -U nltk

or (for Windows) download the latest NLTK using http://pypi.python.org/pypi/nltk

Then check that you have version 3.1 using:

python3 -c "import nltk; print(nltk.__version__)"

Step 2

Then download the zip file from http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip and unzip the file and save to C:\some\path\to\stanford-ner\ (In windows)

Step 3

Then set the environment variable for CLASSPATH to C:\some\path\to\stanford-ner\stanford-ner.jar

and the environment variable for STANFORD_MODELS to C:\some\path\to\stanford-ner\classifiers

Or in command line (ONLY for Windows):

set CLASSPATH=%CLASSPATH%;C:\some\path\to\stanford-ner\stanford-ner.jar
set STANFORD_MODELS=%STANFORD_MODELS%;C:\some\path\to\stanford-ner\classifiers

(See https://stackoverflow.com/a/17176423/610569 for click-click GUI instructions for setting environment variables in Windows)

(See Stanford Parser and NLTK for details on setting environment variables in Linux)

Step 4

Then in python:

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]

Without setting the environment variables, you can try:

from nltk.tag import StanfordNERTagger

stanford_ner_dir = 'C:\\some\path\to\stanford-ner\'
eng_model_filename= stanford_ner_dir + 'classifiers\english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'

st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

See more detailed instructions on Stanford Parser and NLTK

I encountered exactly the same problem as you described yesterday.

There are 3 things you need to do.

1) Update your NLTK.

pip install -U nltk

Your version should be >3.1 and I see you are using

from nltk.tag.stanford import StanfordNERTagger

However, you gotta use the new module:

from nltk.tag import StanfordNERTagger

2) Download slf4j and update your CLASSPATH.

Here is how you update your CLASSPATH.

javapath = "/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar:/Users/aerin/java/slf4j-1.7.13/slf4j-log4j12-1.7.13.jar"
os.environ['CLASSPATH'] = javapath 

As you see above, the javapath contains 2 paths, one is where stanford-ner.jar is, the other is where you downloaded slf4j-log4j12-1.7.13.jar (It can be downloaded here: http://www.slf4j.org/download.html)

3) Don't forget to specify where you downloaded 'english.all.3class.distsim.crf.ser.gz' & 'stanford-ner.jar'

st = StanfordNERTagger('/Users/aerin/Downloads/stanford-ner-2014-06-16/classifiers/english.all.3class.distsim.crf.ser.gz','/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar') 

st.tag("Doneyo lab did such an awesome job!".split())
alvas

NOTE:

Below is a temporal hack to work with:

  • NLTK version 3.1
  • Stanford NER compiled on 2015-12-09

This solution is NOT meant to be an eternal solution.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!

Please track updates on this issue if you do not want to use this "hack": https://github.com/nltk/nltk/issues/1237 or please use the NER tool compield on 2015-04-20.


In Short

Make sure that you have:

  • NLTK version 3.1
  • Stanford NER compiled on 2015-12-09
  • Set the environment variables for CLASSPATH and STANFORD_MODELS

To set environment variables in Windows:

set CLASSPATH=%CLASSPATH%;C:\some\path\to\stanford-ner\stanford-ner.jar
set STANFORD_MODELS=%STANFORD_MODELS%;C:\some\path\to\stanford-ner\classifiers

To set environment variables in Linux:

export STANFORDTOOLSDIR=/home/some/path/to/stanfordtools/
export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/stanford-ner.jar
export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/classifiers

Then:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
# Note this is where your stanford_jar is saved.
# We are accessing the environment variables you've 
# set through the NLTK API.
>>> print st._stanford_jar
/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar
>>> stanford_dir = st._stanford_jar.rpartition("\\")[0] # windows
# Note in linux you do this instead: 
>>> stanford_dir = st._stanford_jar.rpartition('/')[0] # linux
# Use the `find_jars_within_path` function to get all the
# jar files out from stanford NER tool under the libs/ dir.
>>> stanford_jars = find_jars_within_path(stanford_dir)
# Put the jars back into the `stanford_jar` classpath.
>>> st._stanford_jar = ':'.join(stanford_jars) # linux
>>> st._stanford_jar = ';'.join(stanford_jars) # windows
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]

i fixed!

u should indicate the full path of slf4j-api.jar in CLASSPATH

instead of add jar-path into system environment variable, u can do like this in code:

_CLASS_PATH = "."    
if os.environ.get('CLASSPATH') is not None:
    _CLASS_PATH = os.environ.get('CLASSPATH')
os.environ['CLASSPATH'] = _CLASS_PATH + ';F:\Python\Lib\slf4j\slf4j-api-1.7.13.jar'

important, in nltk/*/stanford.py will reset the classpath like this:

stdout, stderr = java(cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE)

eg. \Python34\Lib\site-packages\nltk\tokenize\stanford.py line:90

u can fix it like this:

_CLASS_PATH = "."
if os.environ.get('CLASSPATH') is not None:
    _CLASS_PATH = os.environ.get('CLASSPATH')
stdout, stderr = java(cmd, classpath=(self._stanford_jar, _CLASS_PATH), stdout=PIPE, stderr=PIPE)

Current Stanford NER tagger version is not compatible with nltk because it requires additional jars that nltk cannot add to the CLASSPATH.

Instead prefer an older version of Stanford NER Tagger that will works perfectly fine like this one: http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip

For those who want to use Stanford NER >= 3.6.0 instead of the 2015-01-30 (3.5.1) or other old version, do this instead:

  1. Put the stanford-ner.jar and slf4j-api.jar into the same folder

    For example, I put the following files to /path-to-libs/

    • stanford-ner-3.6.0.jar
    • slf4j-api-1.7.18.jar
  2. Then:

    classpath = "/path-to-libs/*"
    
    st = nltk.tag.StanfordNERTagger(
        "/path-to-model/ner-model.ser.gz",
        "/path-to-libs/stanford-ner-3.6.0.jar"
    )
    st._stanford_jar = classpath
    result = st.tag(["Hello"])
    
Run2

I think the issue is with how slf4j has been used.

I am on nltk 3.1 and using stanford-parser-full-2015-12-09. I only way I could get it to work was to modify /Library/Python/2.7/site-packages/nltk/parse/stanford.py and add the slf4j jar to self._classpath within init method.

That solved it. Crude - but - works.

Note - I was not trying NER exactly. I was trying something like below

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/Users/run2/stanford-parser-full-2015-12-09'
os.environ['STANFORD_MODELS'] = '/Users/run2/stanford-parser-full-2015-12-09'
parser = stanford.StanfordParser(model_path='/Users/run2/stanford-parser-full-2015-12-09/englishPCFG.ser.gz')
sentences = parser.raw_parse_sents('<some sentence from my corpus>')

According to me the java environment is not set for python in your code.

You could do that by using the following code:

from nltk.tag.stanford import NERTagger
import os
java_path = "/Java/jdk1.8.0_45/bin/java.exe"
os.environ['JAVAHOME'] = java_path
st = NERTagger('../ner-model.ser.gz','../stanford-ner.jar')
tagging = st.tag(text.split())   

Check if this solves your problem.

The best thing to do is simply to download the latest version of the Stanford NER tagger where the dependency problem is now fixed (March 2018).

wget https://nlp.stanford.edu/software/stanford-ner-2018-02-27.zip
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!