nltk StanfordNERTagger : NoClassDefFoundError: org/slf4j/LoggerFactory (In Windows)

自闭症网瘾萝莉.ら 提交于 2019-12-18 02:47:42

问题


NOTE: I am using Python 2.7 as part of Anaconda distribution. I hope this is not a problem for nltk 3.1.

I am trying to use nltk for NER as

import nltk
from nltk.tag.stanford import StanfordNERTagger 
#st = StanfordNERTagger('stanford-ner/all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar')
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
print st.tag(str)

but i get

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
    at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1117)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1076)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1057)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3088)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 5 more

Traceback (most recent call last):
  File "X:\jnk.py", line 47, in <module>
    print st.tag(str)
  File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 66, in tag
    return sum(self.tag_sents([tokens]), []) 
  File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 89, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "X:\Anaconda2\lib\site-packages\nltk\internals.py", line 134, in java
    raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : ['X:\\PROGRA~1\\Java\\JDK18~1.0_6\\bin\\java.exe', '-mx1000m', '-cp', 'X:\\stanford\\stanford-ner.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', 'X:\\stanford\\classifiers\\english.all.3class.distsim.crf.ser.gz', '-textFile', 'x:\\appdata\\local\\temp\\tmpqjsoma', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']

but i can see that the slf4j jar is there in my lib folder. do i need to update an environment variable?

Edit

Thanks everyone for their help, but i still get the same error. Here is what i tried recently

import nltk
from nltk.tag import StanfordNERTagger 
print(nltk.__version__)
stanford_ner_dir = 'X:\\stanford\\'
eng_model_filename= stanford_ner_dir + 'classifiers\\english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'
st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
print st._stanford_model
print st._stanford_jar

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

and also

import nltk
from nltk.tag import StanfordNERTagger 
print(nltk.__version__)
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
print st._stanford_model
print st._stanford_jar
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

i get

3.1
X:\stanford\classifiers\english.all.3class.distsim.crf.ser.gz
X:\stanford\stanford-ner.jar

after that it goes on to print the same stacktrace as before. java.lang.ClassNotFoundException: org.slf4j.LoggerFactory

any idea why this might be happening? I updated my CLASSPATH as well. I even added all the relevant folders to my PATH environment variable.for example the folder where i unzipped the stanford jars, the place where i unzipped slf4j and even the lib folder inside the stanford folder. i have no idea why this is happening :(

Could it be windows? i have had problems with windows paths before

Update

  1. The Stanford NER version i have is 3.6.0. The zip file says stanford-ner-2015-12-09.zip

  2. I also tried using the stanford-ner-3.6.0.jar instead of stanford-ner.jar but still get the same error

  3. When i right click on the stanford-ner-3.6.0.jar, i notice

i see this for all the files that i have extracted, even the slf4j files.could this be causing the problem?

  1. Finally, why does the error message say

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory

i do not see any folder named org anywhere

Update: Env variables

Here are my env variables

CLASSPATH
.;
X:\jre1.8.0_60\lib\rt.jar;
X:\stanford\stanford-ner-3.6.0.jar;
X:\stanford\stanford-ner.jar;
X:\stanford\lib\slf4j-simple.jar;
X:\stanford\lib\slf4j-api.jar;
X:\slf4j\slf4j-1.7.13\slf4j-1.7.13\slf4j-log4j12-1.7.13.jar

STANFORD_MODELS
X:\stanford\classifiers

JAVA_HOME
X:\PROGRA~1\Java\JDK18~1.0_6

PATH
X:\PROGRA~1\Java\JDK18~1.0_6\bin;
X:\stanford;
X:\stanford\lib;
X:\slf4j\slf4j-1.7.13\slf4j-1.7.13

anything wrong here?


回答1:


EDITED

Note: The following answer will only work on:

  • NLTK version 3.1
  • Stanford Tools compiled since 2015-04-20

As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!


Step 1

First update your NLTK to the version 3.1 using

pip install -U nltk

or (for Windows) download the latest NLTK using http://pypi.python.org/pypi/nltk

Then check that you have version 3.1 using:

python3 -c "import nltk; print(nltk.__version__)"

Step 2

Then download the zip file from http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip and unzip the file and save to C:\some\path\to\stanford-ner\ (In windows)

Step 3

Then set the environment variable for CLASSPATH to C:\some\path\to\stanford-ner\stanford-ner.jar

and the environment variable for STANFORD_MODELS to C:\some\path\to\stanford-ner\classifiers

Or in command line (ONLY for Windows):

set CLASSPATH=%CLASSPATH%;C:\some\path\to\stanford-ner\stanford-ner.jar
set STANFORD_MODELS=%STANFORD_MODELS%;C:\some\path\to\stanford-ner\classifiers

(See https://stackoverflow.com/a/17176423/610569 for click-click GUI instructions for setting environment variables in Windows)

(See Stanford Parser and NLTK for details on setting environment variables in Linux)

Step 4

Then in python:

>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]

Without setting the environment variables, you can try:

from nltk.tag import StanfordNERTagger

stanford_ner_dir = 'C:\\some\path\to\stanford-ner\'
eng_model_filename= stanford_ner_dir + 'classifiers\english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'

st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar) 
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

See more detailed instructions on Stanford Parser and NLTK




回答2:


I encountered exactly the same problem as you described yesterday.

There are 3 things you need to do.

1) Update your NLTK.

pip install -U nltk

Your version should be >3.1 and I see you are using

from nltk.tag.stanford import StanfordNERTagger

However, you gotta use the new module:

from nltk.tag import StanfordNERTagger

2) Download slf4j and update your CLASSPATH.

Here is how you update your CLASSPATH.

javapath = "/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar:/Users/aerin/java/slf4j-1.7.13/slf4j-log4j12-1.7.13.jar"
os.environ['CLASSPATH'] = javapath 

As you see above, the javapath contains 2 paths, one is where stanford-ner.jar is, the other is where you downloaded slf4j-log4j12-1.7.13.jar (It can be downloaded here: http://www.slf4j.org/download.html)

3) Don't forget to specify where you downloaded 'english.all.3class.distsim.crf.ser.gz' & 'stanford-ner.jar'

st = StanfordNERTagger('/Users/aerin/Downloads/stanford-ner-2014-06-16/classifiers/english.all.3class.distsim.crf.ser.gz','/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar') 

st.tag("Doneyo lab did such an awesome job!".split())



回答3:


NOTE:

Below is a temporal hack to work with:

  • NLTK version 3.1
  • Stanford NER compiled on 2015-12-09

This solution is NOT meant to be an eternal solution.

Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!

Please track updates on this issue if you do not want to use this "hack": https://github.com/nltk/nltk/issues/1237 or please use the NER tool compield on 2015-04-20.


In Short

Make sure that you have:

  • NLTK version 3.1
  • Stanford NER compiled on 2015-12-09
  • Set the environment variables for CLASSPATH and STANFORD_MODELS

To set environment variables in Windows:

set CLASSPATH=%CLASSPATH%;C:\some\path\to\stanford-ner\stanford-ner.jar
set STANFORD_MODELS=%STANFORD_MODELS%;C:\some\path\to\stanford-ner\classifiers

To set environment variables in Linux:

export STANFORDTOOLSDIR=/home/some/path/to/stanfordtools/
export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/stanford-ner.jar
export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/classifiers

Then:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
# Note this is where your stanford_jar is saved.
# We are accessing the environment variables you've 
# set through the NLTK API.
>>> print st._stanford_jar
/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar
>>> stanford_dir = st._stanford_jar.rpartition("\\")[0] # windows
# Note in linux you do this instead: 
>>> stanford_dir = st._stanford_jar.rpartition('/')[0] # linux
# Use the `find_jars_within_path` function to get all the
# jar files out from stanford NER tool under the libs/ dir.
>>> stanford_jars = find_jars_within_path(stanford_dir)
# Put the jars back into the `stanford_jar` classpath.
>>> st._stanford_jar = ':'.join(stanford_jars) # linux
>>> st._stanford_jar = ';'.join(stanford_jars) # windows
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]



回答4:


i fixed!

u should indicate the full path of slf4j-api.jar in CLASSPATH

instead of add jar-path into system environment variable, u can do like this in code:

_CLASS_PATH = "."    
if os.environ.get('CLASSPATH') is not None:
    _CLASS_PATH = os.environ.get('CLASSPATH')
os.environ['CLASSPATH'] = _CLASS_PATH + ';F:\Python\Lib\slf4j\slf4j-api-1.7.13.jar'

important, in nltk/*/stanford.py will reset the classpath like this:

stdout, stderr = java(cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE)

eg. \Python34\Lib\site-packages\nltk\tokenize\stanford.py line:90

u can fix it like this:

_CLASS_PATH = "."
if os.environ.get('CLASSPATH') is not None:
    _CLASS_PATH = os.environ.get('CLASSPATH')
stdout, stderr = java(cmd, classpath=(self._stanford_jar, _CLASS_PATH), stdout=PIPE, stderr=PIPE)



回答5:


Current Stanford NER tagger version is not compatible with nltk because it requires additional jars that nltk cannot add to the CLASSPATH.

Instead prefer an older version of Stanford NER Tagger that will works perfectly fine like this one: http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip




回答6:


For those who want to use Stanford NER >= 3.6.0 instead of the 2015-01-30 (3.5.1) or other old version, do this instead:

  1. Put the stanford-ner.jar and slf4j-api.jar into the same folder

    For example, I put the following files to /path-to-libs/

    • stanford-ner-3.6.0.jar
    • slf4j-api-1.7.18.jar
  2. Then:

    classpath = "/path-to-libs/*"
    
    st = nltk.tag.StanfordNERTagger(
        "/path-to-model/ner-model.ser.gz",
        "/path-to-libs/stanford-ner-3.6.0.jar"
    )
    st._stanford_jar = classpath
    result = st.tag(["Hello"])
    



回答7:


I think the issue is with how slf4j has been used.

I am on nltk 3.1 and using stanford-parser-full-2015-12-09. I only way I could get it to work was to modify /Library/Python/2.7/site-packages/nltk/parse/stanford.py and add the slf4j jar to self._classpath within init method.

That solved it. Crude - but - works.

Note - I was not trying NER exactly. I was trying something like below

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/Users/run2/stanford-parser-full-2015-12-09'
os.environ['STANFORD_MODELS'] = '/Users/run2/stanford-parser-full-2015-12-09'
parser = stanford.StanfordParser(model_path='/Users/run2/stanford-parser-full-2015-12-09/englishPCFG.ser.gz')
sentences = parser.raw_parse_sents('<some sentence from my corpus>')



回答8:


According to me the java environment is not set for python in your code.

You could do that by using the following code:

from nltk.tag.stanford import NERTagger
import os
java_path = "/Java/jdk1.8.0_45/bin/java.exe"
os.environ['JAVAHOME'] = java_path
st = NERTagger('../ner-model.ser.gz','../stanford-ner.jar')
tagging = st.tag(text.split())   

Check if this solves your problem.




回答9:


The best thing to do is simply to download the latest version of the Stanford NER tagger where the dependency problem is now fixed (March 2018).

wget https://nlp.stanford.edu/software/stanford-ner-2018-02-27.zip


来源:https://stackoverflow.com/questions/34361725/nltk-stanfordnertagger-noclassdeffounderror-org-slf4j-loggerfactory-in-windo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!