stanford-nlp | 易学教程

PCFG vs SR Parser

阅读更多关于 PCFG vs SR Parser

问题 It looks like stanfordnlp has these SR models for some time. I am really new to NLP but we are currently using PCFG parser and we are having serious performance issues( that we cut down the parse length to 35) I was thinking if we could try using SR. I tried it with POS tagger from stanford(english-left3words-distsim.tagger) Would you know how SR is on accuracy vs PCFG? I also find sentence root detection issues with SR and dep parse: Example: Michael Jeffrey Jordan, also known by his

PCFG vs SR Parser

阅读更多关于 PCFG vs SR Parser

ConllReader (Like RothCONLL04Reader) throws exception while reading relation training data with custom NER and custom relation

阅读更多关于 ConllReader (Like RothCONLL04Reader) throws exception while reading relation training data with custom NER and custom relation

问题 In continuation of the following question. How to generate custom training data for Stanford relation extraction Thanks to StanfordNLPHelp i am able to generate relation data with custom ner and on top of it regexner. I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc. Example custom NER classes. "DEGREE", "DESG" Example of relation training data. 0 ELECTEDBODY 0 O NNP/IN/NNP BOARD/OF/DIRECTORS O O O 0 ORGANIZATION 1 O NNP Board O O O

How to set delimiters for PTB tokenizer?

阅读更多关于 How to set delimiters for PTB tokenizer?

问题 I'm using StanfordCore NLP Library for my project.It uses PTB Tokenizer for tokenization.For a statement that goes like this- go to room no. #2145 or go to room no. *2145 tokenizer is splitting #2145 into two tokens: #,2145. Is there any way possible to set tokenizer so that it does't identify #,* like a delimiter? 回答1: A quick solution is to use this option: (command-line) -tokenize.whitespace (in Java code) props.setProperty("tokenize.whitespace", "true"); This will cause the tokenizer to

Spark Scala - java.util.NoSuchElementException & Data Cleaning

阅读更多关于 Spark Scala - java.util.NoSuchElementException & Data Cleaning

问题 I have had a similar problem before, but I am looking for a generalizable answer. I am using spark-corenlp to get Sentiment scores on e-mails. Sometimes, sentiment() crashes on some input (maybe it's too long, maybe it had an unexpected character). It does not tell me it crashes on some instances, and just returns the Column sentiment('email) . Thus, when I try to show() beyond a certain point or save() my data frame, I get a java.util.NoSuchElementException because sentiment() must have

Error in Stanford Pos Tagger

阅读更多关于 Error in Stanford Pos Tagger

问题 Hello i am trying to do POS tag for a certain sentence using Stanford Pos Tagger. I am using Python 3.4 nltk 3.1 on windows7 Following is the code i used: import nltk from nltk.tag.stanford import POSTagger import os java_path = r"C:\Program Files\Java\jre1.8.0_66\bin\java.exe" os.environ['JAVAHOME'] = java_path St=POSTagger(r"C:\Python34\Scripts\stanford-postagger-2015-12-09\models\english-bidirectional-distsim.tagger",r"C:\Python34\Scripts\stanford-postagger-2015-12-09\stanford-postagger

CoreNLP on Apache Spark

阅读更多关于 CoreNLP on Apache Spark

问题 I'm not sure if this is related to Spark or NLP. Please help.I'm currently trying to run Stanford CoreNLP Library on Apache Spark and when I try to run it on multiple cores, I get the following exception. I'm using the latest NLP Library which is thread safe. This is happening during the map phase on line. pipeline.annotate(document); java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java

NER CRF, Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory [duplicate]

阅读更多关于 NER CRF, Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory [duplicate]

问题 This question already has answers here : Why am I getting a NoClassDefFoundError in Java? (23 answers) Closed 3 years ago . I have downloaded the latest version for NER from this link. Then after extracting it, I have run this command. java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop This is not working and getting following exception. CRFClassifier invoked on Mon Jul 25 06:56:22 EDT 2016 with arguments: -prop austen.prop Exception in thread "main" java.lang

Failed to execute goal

阅读更多关于 Failed to execute goal

问题 I'm new to maven . I tired to mvn clean worked sucessfully but after mvn package I got the follwoing error [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3:compile (default-compile) on project L: Compilation failure: Compilation failure: [ERROR] /home/user/L/src/main/java/edu/stanford/nlp/pipeline/NLP.java:[4,34] package edu.stanford.nlp.neural.rnn does not exist [ERROR] [ERROR] /home/user/L/src/main/java/edu/stanford/nlp/pipeline/NLP.java:[6,32] cannot find

Stanford PTBTokenizer token's split delimiter

阅读更多关于 Stanford PTBTokenizer token's split delimiter

问题 There is a way to provide to the PTBTokenizer a set of delimiters characters to split a token ? i was testing the behaviour of this tokenizer and i've realized that there are some characters like the vertical bar '|' for which the tokenizer diviedes a substring into two token, and others like the slash or the hypen for which the tokenizer return a single token. 回答1: There's not any simple way to do this with the PTBTokenizer, no. You can do some pre-processing and post-processing to get what