I am trying to analyse some French text with the Stanford CoreNLP tool (it's my first time trying to use any StanfordNLP software)
To do so, I have downloaded the v3.6.0 jar and the corresponding french models.
Then I run the server with:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
As described in this answer, I call the API with:
wget --post-data 'Bonjour le monde.' 'localhost:9000/?properties={"parse.model":"edu/stanford/nlp/models/parser/nndep/UD_French.gz", "annotators": "parse", "outputFormat": "json"}' -O -
but I get the following log + error:
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP
Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/parser/nndep/UD_French.gz ...
edu.stanford.nlp.io.RuntimeIOException: java.io.StreamCorruptedException: invalid stream header: 64696374
at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:188)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:212)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:115)
...
The solutions proposed here suggest the code and model version differs but I have dowloaded them from the same page (and they both have the same version number in their name) so I am pretty sure they are the same.
Any other hint on what I am doing wrong?
(I should also mention that I am not a Java expert, so maybe I forgot a stupid step... )
Ok, after a lot of readings and unsuccessful tries, I found a way to make it work (for v3.6.0). Here are the details, if they can be of any interest to someone else:
Dowload the code and french models from http://stanfordnlp.github.io/CoreNLP/index.html#download. Unzip the code
.zipand copy the french model.jarto that directory (do not remove the english models, they have different names anyway)cd to that directory and then run the server with:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
(it's a pity that the -prop flag doesn't help here)
Call the API repeating the properties listed in the
StanfordCoreNLP-french.properties:wget --header="Content-Type: text/plain; charset=UTF-8" --post-data 'Bonjour le monde.' 'localhost:9000/?properties={ "annotators": "tokenize,ssplit,pos,parse", "parse.model":"edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz", "pos.model":"edu/stanford/nlp/models/pos-tagger/french/french.tagger", "tokenize.language":"fr", "outputFormat": "json"}' -O -which finally gives a 200 response using the French models!
(NB: don't know how to make it work with the UI (same for utf-8 support))
来源:https://stackoverflow.com/questions/37838374/running-stanford-corenlp-server-with-french-models